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5 Method, and. Apparatus for Introducing Information into a 

Data Stream and Method and Apparatus for Encoding an Audio 
Signal 



10 Dcocript 4eft 

Field of the Invention 

The present invention relates, in general, to audio signals 
15 and, in particular, to introducing information into a data 
stream having spectral values that represent a short-term 
spectrum of an audio signal. Especially in the field of 
copyright protection for audio signals, the present inven- 
tion serves to introduce copyright information, for exam- 
20 pie, into an audio signal as inaudible as possible. 



Background of the Invention and Prior Art 

25 With the increasing distribution of the Internet, music pi- 
racy has also drastically increased. At many locations on 
the Internet, of music or, in general, audio signals can be 
downloaded. Copyrights are only considered in very few 
cases. Particularly, the authorisation of the author is 

30 very rarely obtained as to whether he wants to offer his 
work or not. Fees occurring are rarely paid to the author 
for lawful copying. Apart from that, an uncontrolled copy- 
ing of works takes place which, in most cases, also happens 
without consideration of copyrights. 

35 

When music is lawfully purchased from a provider of music 
via the Internet, the provider usually produces a header in 
which copyright information as well as, for example, a cus- 
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5 tomer ID are introduced, the customer ID uniquely referring 
to the present purchaser. It is further known to introduce 
copy allowance information into that header, which signal 
the diverse types of copyrights, for example, that the 
copying of the current piece is completely forbidden, that 
10 the copying of the current piece is only allowed once, that 
the copying of the current piece is totally free, etc. 

The customer has a decoder that reads in the header, and 
that, in compliance with the allowed actions, for example, 
15 only allows one copy and refuses further copies. 

This concept for consideration of copyrights, however, only 
works for customers who behave legally. 

20 Illegal customers usually have a significant potential of 
creativity to "crack" pieces of music that are provided 
with a header. The disadvantage of the described procedure 
for the protection of copyrights is shown here. Such a 
header can be removed easily. Alternatively, an illegal 

25 user could also modify individual entries in the header, 
for example, to change the entry "copying forbidden" to an 
entry "copying totally free". It is also a possible case 
that an illegal customer removes his own customer ID from 
the header and then offers the piece of music on his or an- 

30 other Homepage in the Internet. From that moment onwards, 
it is no longer possible to identify the illegal customer, 
since he has removed his customer ID. Attempts to prevent 
such violations of the copyright will, therefore, inevita- 
bly be useless, since the copy information has been removed 

35 from the piece of music or has been modified and, since the 
illegal customer who has done that, cannot be identified 
anymore to call him to account. If, instead, a secure in- 
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5 troduction of information into the audio signal were exis- 
tent, then government authorities who prosecute copyright 
violations could trace suspicious pieces of music in the 
Internet and, for example, could establish the user identi- 
fication of such illegal pieces in order to put a stop to 
10 the illegal users. 

From WO 97/33391, an encoding method for introducing an in- 
audible data signal into an audio signal is known. There, 
the audio signal into which the inaudible data signal is to 

15 be introduced is converted into the frequency area in order 
to determine the masking threshold of the audio signal us- 
ing a psychoacoustic model. The data signal to be intro- 
duced into the audio signal is multiplied with a pseudo 
noise signal in order to create a frequency-spread data 

20 signal. The frequency-spread data signal is then weighted 
with a psychoacoustic masking threshold, such that the en- 
ergy of the frequency-spread data signal will always be be- 
low the masking threshold. Finally, the weighted data sig- 
nal is superimposed on the audio signal, whereby an audio 

25 signal is created in which the data signal is inaudibly in- 
troduced. On the one hand, the data signal can be used to 
establish the range of a transmitter. On the other hand, 
the data signal can be used for the identification of audio 
signals in order to easily identify possible pirate copies, 

30 since every sound carrier, for example, a compact disc, is 
provided with an individual identification ex works. Fur- 
ther described possibilities for the application of the 
data signal is the remote control of audio devices, analo- 
gous to the "VPS" method on television. 

35 

This method is highly secured against music pirates, since; 
on the one hand, they are probably not aware that the piece 
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5 of music that they are copying is identified. Apart from 
that, it is almost impossible to extract the data signal, 
which is inaudibly present in the audio signal without an 
authorised decoder. 

10 Audio signals are 16 bit PCM samples, when they come from a 
compact disc. A music pirate could, for example, manipulate 
the sampling rate or the levels or phases of samples to 
make the data signal unreadable, i.e., undecodable, whereby 
the copyright information would also be removed from the 

15 audio signal. This, however, will not be possible without 
significant quality losses. Data that are introduced into 
audio signals in such a way can therefore, analogous to 
bank notes, also be referred to as "watermarks". 

20 The method described in WO 97/33391 for introducing an in- 
audible data signal into an audio signal works by using the 
audio samples that are present as time domain samples. 
Thereby, it is necessary that audio pieces, i.e., pieces of 
music, radio plays, etc., have to be present as a sequence 

25 of timely samples in order to be provided with a watermark. 
This has the disadvantage that this method cannot be used 
for already-compressed data streams that have been proc- 
essed, for example, according to one of the MPEG methods. 
This means that a provider of pieces of music who wants to 

30 provide the pieces of music with a watermark prior to ship- 
ment to the customer has to store the pieces of music as a 
sequence of PCM samples. This leads to the provider for mu- 
sic needing to have a very high storage capacity. However, 
it would be desirable to use the very-effective audio com- 

35 pressing method already for storing the audio data at the 
provider . 



5 A provider for audio data of the above-described type 
could, of course, simply compress all pieces of music, for 
example, by using the standards MPEG-2 AAC 13818-7 and then 
decompress them fully again before the audio piece is to be 
provided with a watermark, in order to have a sequence of 

10 audio samples again that will then be fed into a known ap- 
paratus for introducing an inaudible data signal in order 
to introduce a watermark. This needs a significant effort 
in that prior to the introduction of information into the 
audio signal, a full decompression or decoding is neces- 

15 sary. Such a decoding costs time and money. However, a much 
more serious feature is the fact that in such a procedure, 
tandem encoding effects occur. 

A further disadvantage of this procedure is that due to the 
20 fact that the watermark is introduced into the PCM data, 
there is no security as to whether the watermark is still 
present after an audio compression. When PCM data provided 
with watermarks and having a relatively low bit rate and 
are encoded, the encoder introduces a lot of quantizing 
25 noise when quantizing due to the relatively low bit rate, 
which will, in an extreme case, lead to the fact that no 
watermark can be decoded anymore. It is also problematic 
that with this procedure, the bit rate of the audio encoder 
that encodes the PCM data provided with watermarks is not 
30 known previously and that is why no secure control of the 
ratio between watermark energy and noise energy due to the 
quantizing noise is possible. 

It is known that audio encoding methods according to one of 
35 the MPEG standards are no loss-less encoding methods, but 
lossy encoding methods. Bit savings in comparison to direct 
transmission of audio samples in the time domain are 
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5 achieved, to a large part, by making use of psychoacoust ic 
masking effects. Particularly, for a block of, for example, 
2048 audio samples, the psychoacoust ic masking threshold 
will be established as a function of frequency, whereupon, 
after a time frequency transformation of the audio samples 

10 the quantizing of spectral values including the short-term 
spectrum will be carried out under consideration of this 
psychoacoustic masking threshold. In other words, the quan- 
tizer step size is controlled, such that the noise energy 
introduced by quantizing is smaller or equal to the psycho- 

15 acoustic masking threshold. In areas of the audio signal 
where the masking index, i.e., the ratio of audio signal 
energy to the psychoacoustic masking threshold is very 
small, like, for example, in very noisy areas of the audio 
signal, the spectral values need to be only roughly quan- 

20 tized, without audible interferences occurring after a sub- 
sequent decoding. In other areas where the audio signal is 
very tonal, it has to be quantized more finely, such that 
relatively small noise energy results due to the quantiz- 
ing, since the masking index is very large. 

25 

It becomes clear from the above that due to the quantizing 
procedure, information of the original audio signal gets 
lost. This does not matter when the quantized audio signal 
is decoded again, since the noise energy due to the quan- 

30 tizing has been distributed in such a way that it remains 
below the psychoacoustic masking threshold and will, there- 
fore, be inaudible when an ideal psychoacoustic model has 
been used. These considerations, however, always only apply 
for a certain short-term spectrum or for a block of, for 

35 example, 2048 subsequent audio values, respectively. After 
the decoding, the block of audio samples does, however, 
comprise no more information about how the block building 
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5 was performed. When the known apparatus for introducing in- 
formation has been used which, in most cases, has a certain 
delay compared to an audio encoder that does not introduce 
information, it can therefore not be assumed that the same 
block partitioning takes place accidentally. Instead, the 

10 block partitioning, the short-term spectrum creation and 
the quantizing will take place in a totally different block 
raster. A renewed decoding will then usually lead to 
clearly audible interferences, since it does not refer to 
the same short-term spectrum, but to different short-term 

15 spectrums. This appearance of audible interferences through 
two encoding/decoding stages due to their different parti- 
tioning of the stream of audio samples into blocks is re- 
ferred to as tandem encoding effect. 

20 It should be noted that in general by introducing the inau- 
dible data signal, noise energy is introduced into the au- 
dio signal, which already includes noise energy due to the 
uninfinitely fine quantizing procedure. Introducing the in- 
audible data signal therefore has a tendency to lead to a 

25 deterioration of the audio quality unless special precau- 
tions will be taken. In this connection, a further intro- 
duction of noise energy due to the tandem encoding effects 
previously described is therefore even less desirable, 
since this quality loss appears systematically without any 

30 benefit, while small quality deteriorations due to the wa- 
termarks are more acceptable, since the watermark also has 
an advantage. Tandem encoding effects, however, only cause 
interferences, but have no advantage at all. 

35 U.S. Patent No. 5,687,191 discloses a concept for transmit- 
ting hidden data after data compression. An audio signal is 
transferred into sub-band samples via a sub-band encoder, 
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5 wherein each sub-band filter generates a sequence of timely 
samples whose spectral bandwidth is the same as the band- 
width of the respective sub-band filter. A data stream with 
such quantized sub-band samples will be unpacked and de- 
multiplexed in order to perform an inverse quantizing, such 

10 that sub-band samples will be present again. Further, a 
pseudo noise spread sequence is filtered by a sub-band fil- 
ter bank to obtain a sequence of timely sub-band samples 
for every filter of the sub-band filter bank having a band- 
width determined by the respective sub-band filter. The da- 

15 ta to be transported will be subjected to a forward error 
correction and a performance control securing that the aux- 
iliary data signal is below the noise quantizing floor of 
the audio sub-band samples. The so processed auxiliary data 
values will then be connected with respective sub-band val- 

2 0 ues of the pseudo noise spread sequence via respective mo- 

dulators and then XORed with the unpacked sub-band values 
of the audio signal. The so obtained combined sub-band val- 
ues will then be quantized again and packed, in order to 
obtain an output data stream. 

25 

Summary of the Invention 

It is the object of the present invention to provide a con- 
30 cept that makes it possible to provide audio pieces with a 
watermark, while the effects of the watermark to the audio 
quality should be as low as possible. 

In accordance with a first aspect of the invention, this 

3 5 object is achieved by a method for introducing information 

into a data stream including data about spectral values 
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5 representing a short-term spectrum of an audio signal, in- 
cluding : 

processing the data stream to obtain the spectral values of 
the short-term spectrum of the audio signal; 

10 

combining the information with a spread sequence to obtain 
a spread information signal; 

generating a spectral representation of the spread informa- 
1 5 tion signal to obtain a spectral spread information signal; 

establishing psychoacoustic maskable noise energy as func- 
tion of frequency for the short-term spectrum of the audio 
signal, wherein the psychoacoustic maskable noise energy is 
2 0 smaller or the same as the psychoacoustic masking threshold 
of the short-term spectrum; 

weighting the spectral spread information signal by using 
the established noise energy to generate a weighted infor- 

2 5 mation signal, wherein the energy of the introduced infor- 

mation is substantially equal to or below the psychoacous- 
tic masking threshold; 

summing the weighted information signal with the spectral 

3 0 values of the short-term spectrum of the audio signal to 

obtain sum spectral values including the short-term spec- 
trum of the audio signal and the information; and 

processing the sum spectral values to obtain a processed 
3 5 data stream including the data about the spectral values of 
the short-term spectrum of the audio signal and the infor- 



mation to be introduced. 
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5 

In accordance with a second aspect of the invention, this 
object is achieved by a method for generating a short-term 
spectrum of the audio signal including a plurality of spec- 
tral values, comprising; computing the psychoacoust ic mask- 

10 ing threshold of the audio signal using a psychoacoust ic 
model; quantizing the spectral values considering the psy- 
choacoustic masking threshold so that the noise energy in- 
troduced by quantizing is smaller than the psychoacoust ic 
masking threshold by a predetermined amount; forming a bit 

15 stream including values corresponding to the quantized 
spectral values of the short-term spectrum. 

In accordance with a third aspect of the invention, this 
object is achieved by a Apparatus for introducing informa- 

2 0 tion into a data stream including data about spectral val- 
ues representing a short-term spectrum of an audio signal, 
including: a processor for processing the data stream to 
obtain the spectral values of the short-term spectrum of 
the audio signal; a combiner for combining the information 

2 5 with a spread sequence to obtain a spread information sig- 
nal; a generator for generating a spectral representation 
of the spread information signal to obtain a spectral 
spread information signal; an establisher for establishing 
psychoacoustic maskable noise energy as function of the 

30 frequency for the short-term spectrum of the audio signal, 
wherein the psychoacoustic maskable noise energy is smaller 
than or equal to the psychoacoustic masking threshold of 
the short-term spectrum; a weighter for weighting the spec- 
tral spread information signal by using the established 

35 noise energy to generate a weighted information signal, 
wherein the energy of the introduced information is sub- 



stantially equal to or below the psychoacoustic masking 



a Vi o o *a ^ o .1 f ? a u ? n : « «- ? 

- ii - 

5 threshold; a summer for summing the weighted information 
signal with the spectral values of the short-term spectrum 
of the audio signal to obtain spectral values including the 
short-term spectrum of the audio signal and the informa- 
tion; and another processor for processing the sum spec- 
10 tral values to obtain a processed data stream including the 
data about the spectral values of the short-term spectrum 
of the audio signal and the information to be introduced 

In accordance with a fourth aspect of the invention, this 
15 object is achieved by a Apparatus for encoding an audio 
signal, including: a generator for generating a short-term 
spectrum of the audio signal including a plurality of spec- 
tral values; a calculator for computing a psychoacoustic 
masking threshold of the audio signal using a psychoacous- 
2 0 tic model; a quantizer for quantizing spectral values con- 
sidering the psychoacoustic masking threshold so that the 
noise energy introduced by quantizing is smaller than the 
psychoacoustic masking threshold by a predetermined amount; 
a bitstream formatter for forming a bit stream including 
25 values corresponding to the quantized spectral values of 
the short-term spectrum. 

This — obj cct — a-s — achieved by a method — — introducing — infor - 
mation — into — a — data — stream according — fee- — claim — 3^ — a method 
30 for encoding an audio signal according to claim 11 or 12, 
an apparatus — f-ene — introducing — information according to — claim 
-13 — and an apparatus — f-o^e — encoding an — audio — signal — according 
to claim 15 or 16. 

35 The present invention is based on the knowledge that it has 
to be given up to carry out a complete decoding before in- 
serting the watermark. Instead, a data stream including 
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5 spectral values representing a short-term spectrum of an 
audio signal will therefore inventively only be partly "un- 
packed" until the spectral values are present. The un- 
packing is, however, not a complete decoding, but only a 
partly decoding where all the information about the block 
10 forming or the block raster used in the original encoder, 
respectively, is not touched. 

This is achieved by carrying out the inventive method with 
spectral values and not with timely samples. The informa- 

15 tion, which is to be introduced into the audio signal, will 
be combined with a spread sequence in the sense of a spread 
spectrum modulation in order to obtain a spread information 
signal. Afterwards, a spectral representation of the spread 
information signal will be generated, for example, by a 

20 filter bank, a FFT, a MDCT or similar, in order to obtain a 
spectral spread information signal. Now, a psychoacoustic 
maskable interference will be established as a function of 
frequency for the short-term spectrum of the audio signal 
to then weighten the spectral spread information signal by 

25 using the established noise energy, so that a weighted 
information signal can be generated, the energy of which is 
substantially equal or below the psychoacoustic masking 
threshold. After that, the weighted information signal will 
be added to the spectral values of the short-term spectrum 

30 of the audio signal in order to obtain sum spectral values 
including the short-term spectrum of the audio signals and, 
additionally, the introduced information. Finally, the sum 
spectral values will be processed again in order to obtain 
a processed data stream including the data about the spec- 

35 tral values of the short-term spectrum of the audio signal 
and the information, which has to be introduced. In the 
case of a MPEG-AAC encoder, the processing of the sum spec- 
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5 tral values will, again, include the quantizing and entropy 
encoding, for example, by using a Huffman code. 

It is to be noted that, thereby, the block rastering pro- 
vided by the original encoder, which produces the data 

10 stream, will not be touched. Thereby, no tandem effects 
will occur, that would lead to a loss of audio quality. 
Apart from that, it is preferred that with the processing 
happening after the weighting that comprises quantizing, 
the same quantizing step size(s) as in the original bit 

15 stream s/are used, which has the advantage that the very 
computing intensive iteration loops of the quantizer do not 
need to be computed again. Further, no tandem encoding ef- 
fects occur that would otherwise be unavoidable, since in 
the case of a renewed computing, more or less strongly dif- 

20 fering quantizing step sizes could occur. 

The inventive introduction of a watermark directly into a 
data stream enables, for example, the introduction of a 
customer ID during the delivery of the music to a customer, 
25 since the procedure can be executed on modern personal com- 
puters in multiple real time since, among others, the ex- 
pensive frequency time transformation is not needed, which 
would be needed with a complete decoding. 

30 A further advantage of the present invention is that the 
music provider does not have to store the PCM samples, but 
can store pre-encoded data streams which can offer a factor 
in the order of 12 in storage place and that the provider 
can still introduce customer specific watermarks without 

35 the occurrence of additional tandem encoding effects which 
would lead to an audio quality loss. 
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5 The inventive procedure can easily be implemented, since 
only an additional time/ frequency transformation of the 
spread information signal is necessary. A further signifi- 
cant advantage is that the inventive method has a good in- 
teroperability, i.e., that standard data streams can be 

10 processed and that for watermarks according to the known 
methods and for watermarks according to the inventive 
method, the same watermark decoder can be used. Finally, it 
is a further advantage that an audio encoder cannot erase 
the watermark anymore, since an exact control of the ratio 

15 between quantizing noise and watermark energy exists. 

It is to be noted that it is, of course, possible to remove 
the watermark illegally when the data stream provided with 
the watermark is decoded and then encoded again, but only 

20 with a low bit rate. In this case, the noise energy intro- 
duced by the quantizer will exceed the watermark energy, so 
that no watermark can be extracted from the audio signal 
anymore. This is not a problem however, since the audio 
quality of the audio signal has decreased so strongly due 

25 to the high quantizing noise that such a poor audio signal 
does not have to be protected any longer. If the watermark 
in an audio signal is destroyed, then its quality is also 
destroyed . 

30 The psychoacoustic maskable noise energy can be established 
in different ways. The first option is to use a psycho- 
acoustic model for establishing the psychoacoustic maskable 
interference energy, which generates the psychoacoustic 
masking threshold as a function of a frequency from the 

35 short-term spectrum. A plurality of psychoacoustic models 
exists, those psychoacoustic models which work with spec- 
tral values of the short-term spectrum anyway are espe- 
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5 cially advantageous, since these spectral values are di- 
rectly present due to the partly un-packing of the data 
stream. However, other psychoacoustic models can be used 
alternatively, which are developed for time domain data 
wherein, here, in contrary to the above-described option, a 

10 frequency time transformation would be necessary. Although 
the possibility of calculating a psychoacoustic model in 
order to obtain the psychoacoustic masking threshold of the 
short-term spectrum is relatively computing time-extensive, 
this possibility does, however, offer the decisive advan- 

15 tage that no tandem encoding effects will be generated, 
since the block rastering will not be touched. 

Another more favourable option concerning the computing 
time effort for establishing the psychoacoustic maskable 

20 noise energy is to generate the data stream in such a way 
that it comprises apart from the spectral values and the 
usual side information, also the psychoacoustic masking 
threshold as a function of a frequency for every short-term 
spectrum. Establishing the psychoacoustic maskable noise 

25 energy then functions simply by extracting the psychoacous- 
tic masking threshold transmitted in the data stream. With 
this possibility and the possibility described above where 
the psychoacoustic masking model is computed, the psycho- 
acoustic maskable noise energy is the psychoacoustic mask- 

30 ing threshold itself. The disadvantage of the method for 
transmitting the psychoacoustic masking threshold in the 
data stream is the fact that a special audio encoder is 
needed, since the psychoacoustic masking threshold is not 
transmitted with common audio encoding, but only the spec- 

35 tral values and the respective scale factors. In closed 
systems, however, compatibility to standard data streams is 
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5 not required. Therefore, this option can be implemented 
here with little effort and favourable computing time. 

It is another possibility to provide a special audio en- 
coder whose quantizer always functions in such a way that 

10 the quantizing noise is lower than the psychoacoust ic mask- 
ing threshold by a predetermined amount. This means that 
the encoder is designed so that its quantizer quantizes a 
bit finer than he would usually have to, such that addi- 
tional noise energy can be added without any noise being 

15 audible. This additional noise energy can then be "used up" 
with the introducing of information into the data stream in 
order to introduce the information. In the case of an opti- 
mum psychoacoustic model, this possibility leads to a data 
stream with an introduced watermark that has suffered no 

20 quality deterioration at all. The disadvantage of this 
method is, like with the direct transmission of the psycho- 
acoustic masking threshold, the fact that this method is 
not compatible with common encoders. 

25 Another possibility for establishing the psychoacoustic 
maskable noise energy is to establish the noise energy that 
has, in fact, been introduced by the quantizing of the en- 
coder which has generated the data stream and to derive the 
information obtained in weighting. This option assumes that 

30 the encoder has quantized such that the noise energy was 
below the psychoacoustic masking threshold or only slightly 
above it. This method can use the standard bit streams like 
the method described as the first possibility, since only 
the spectral values and the scale factors that are both 

35 present in the data stream are needed in order to obtain 
the psychoacoustic maskable noise energy. From the scale 
factors, the step size of the quantizer associated to the 
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5 respective scale factor can be established in order to com- 
pute the noise energy introduced into a scale factor band 
that is typically equal to the psychoacoust ic masking 
threshold or below that. The psychoacoust ic maskable noise 
energy for the introduced information used in weighting can 
10 be the same as the quantizing noise energy, but it can also 
have a factor between greater than zero and smaller than 
one, wherein the factor closer to zero leads to less audi- 
ble interferences due to the watermark, but could be more 
problematic in extracting than a factor closer to one. 

15 



Brief Description of the Drawings 



Preferred embodiments of the present invention will be dis- 
20 cussed in detail below with reference to the accompanying 
drawings. They show: 



Fig. 1 a block diagram of an inventive apparatus for 
introducing information into a data stream; 

Fig. 2 a detailed block diagram of the watermark means 
of Fig. 1.; 

Fig. 3a a schematic representation of a method for estab- 
lishing the maskable noise energy using the psy- 
choacoustic model; 



Fig. 3b a schematic representation of a method for estab- 
lishing the maskable noise energy when the psy- 
35 choacoustic masking threshold is transmitted in 

the data stream; 
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5 Fig. 3c a schematic representation of a method for estab- 
lishing the maskable noise energy when the noise 
energy is estimated with the knowledge of the 
spectral values and the scale factors; 



10 Fig. 3d a schematic representation of a method for estab- 
lishing the psychoacoust ic maskable noise energy 
when energy in the data stream is kept free for 
the watermark; and 



15 Fig. 4 a block diagram of an inventive audio encoder 
that either writes the psychoacoustic masking 
threshold into the data stream or writes the pre- 
determined amount for the method described in 
Fig. 3d into the data stream and whose quantizer 

20 is controlled respectively. 



Detailed Description of Preferred Embodiments 



25 Before the individual Figs, will be referred to in more de- 
tail, the system theoretical background of the present in- 
vention will be briefly discussed. In general, the intro- 
duction of information into the audio signal should not 
lead to an audible quality deterioration of the audio sig- 

30 nal, or only to a barely audible one. In order to ascertain 
as to how much energy the signal representing the informa- 
tion to be introduced may have, the masking threshold of 
the audio signal is continuously computed by using a psy- 
choacoustic model. The frequency-selective computing of the 

35 masking threshold by using, for example, the critical bands 
as well as a plurality of further psychoacoustic models is 
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5 known in the art. As an example, it is referred to the 
standard MPEG2-AAC (ISO/IEC 13818-7) . 

The psychoacoustic model leads to a masking threshold for a 
short-term spectrum of the audio signal. Usually, the mask- 

10 ing threshold will vary across the frequency. As a matter 
of definition, it is assumed that a signal introduced into 
the audio signal will then be inaudible when the energy of 
this signal is below the masking threshold. The masking 
threshold strongly depends on the composition of the audio 

15 signal. Noisy signals have a higher masking threshold than 
very tonal signals. The energy of the signal that is intro- 
duced into the audio signal therefore strongly varies 
across the time. Usually, for decoding the information in- 
troduced into an audio signal, a certain signal/noise ratio 

20 is needed. Thereby, it can happen that with very tonal au- 
dio signal portions, the energy of the additionally intro- 
duced signal will become so low that the signal/noise ratio 
will no longer be sufficient for secure decoding. In such 
areas, a decoder cannot, therefore, correctly decode the 

25 individual bits anymore. From a system theoretical point of 
view, the introduction of information into an audio signal 
in dependence of the psychoacoustic masking thresholds can 
therefore be seen as the transmitting of a data signal via 
a channel with strongly varying noise energy, wherein the 

30 audio signal, i.e., the music signal is seen as an inter- 
ference signal. 

Fig. 1 shows a block diagram of an inventive apparatus or 
an inventive method for introducing information into a data 
35 stream including spectral values representing a short-term 
spectrum of an audio signal. The data stream applied to the 
input of a data stream demultiplexer 10 will, if it is 
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5 processed according to the above-mentioned MPEG AAC stan- 
dard, generally first be partitioned into spectral values 
on a line 12 and page information on a line 14, wherein 
from the side information, the scale factors should be par- 
ticularly named here. The spectral values that are also en- 

10 tropy encoded after the demultiplexer 10 will then be fed 
into an entropy decoder 16 and then into an inverse quan- 
tizer 18 that generates the spectral values of the audio 
signal representing the short-term spectrum of the same by 
using the quantized spectral values and the associated 

15 scale factors supplied to the inverse quantizer 18 via line 
14. The spectral values will then be fed into watermark 
means 20 generating sum spectral values including the 
short-term spectrum of the audio signal and, apart from 
that, the information to be introduced. These sum spectral 

20 values will then, again, be fed into a quantizer 22 and en- 
tropy encoded in a following entropy encoder 24 in order to 
finally be led to a data stream multiplexer 26 which also 
receives the necessary side information like, for example, 
the scale factors. Then, at the output of the multiplexer 

25 26, a processed data stream is present which differs from 
the data stream at the input of the demultiplexer 10 in 
that it only has one watermark, i.e., that information has 
been introduced into it. 

30 Before a more detailed reference to Fig. 2 including a de- 
tailed representation of watermark means 20 is discussed, 
for ease of understanding, a MPEG-2 AAC audio encoder is 
referred to as it is, for example, described in appendix B 
of the standard ISO/IEC 13818-7 : 1997 (E) as informative 

35 part. Such an encoder is substantially based on the idea to 
bring the quantizing noise below the so-called psychoacous- 
tic masking threshold, i.e., to hide it. For the transfor- 
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5 rnation of the audio samples into the frequency domain, 
i.e., for generating the spectral representation of the au- 
dio signal, an analysis filter bank is used which is real- 
ised as an critically-under-sampled DCT (DCT = discrete co- 
sine transform) and which has a degree of overlapping of 
10 50%. Its purpose is to create a spectral representation of 
the input signal that will finally be quantized and en- 
coded. Thus, together with a respective filter bank in the 
decoder, a synthesis/analysis system is being built. 

15 The psychoacoust ic model used in such encoders is based on 
the psychoacoustic phenomenon of masking. Both frequency 
area masking effects and time domain masking effects can be 
modelled that way. The psychoacoustic model provides an 
estimated value for "noise" energy that can be added to the 

20 original audio signal without audible interferences appear- 
ing. This maximum admissible energy is referred to as a 
psychoacoustic masking threshold. 

The quantizer 22 and the encoder 24 in Fig. 1 will be de- 
25 scribed below. Typically, more than one spectral lines will 
be quantized with the same quantizer step size. Therefore, 
several adjacent spectral lines will be grouped into so- 
called scale factor bands. The quantizer optimises the 
quantizer step size for each scale factor band. The quan- 
30 tizer step size is determined such that the quantizing 
fault is below or equal to the computed psychoacoustic 
masking threshold in order to make sure that the quantizing 
noise is inaudible. It has to be seen that two limits have 
to be considered and between those, a compromise has to be 
35 found. On the one hand, the bit consumption should be kept 
as low as possible in order to obtain high compression ra- 
tios, i.e., a high encoding gain. On the other hand, it has 
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5 to be made sure that the quantizing noise is below the psy- 
choacoustic masking threshold, so that no interferences are 
audible in the encoded and redecoded audio signals. Typi- 
cally, this optimising method is computed in an iterative 
loop. The result of this loop is a quantizer step size, 

10 clearly corresponding to a scale factor for a scale factor 
band. In other words, the spectral values of the scale fac- 
tor bands will be quantized with a quantizer step size, 
which is clearly allocated to the scale factor responsible 
for the scale factor band. This means that two different 

15 scale factors can also lead to two different quantizer step 
sizes . 

The bit stream is composed by a bit stream multiplexer, 
which mainly fulfils formatting tasks. The data stream that 
20 is a bit stream in the case of a binary system, thus com- 
prises the quantized and encoded spectral values or spec- 
tral coefficients as well as the scale factors and further 
side information which are represented and explained in de- 
tail in the above-mentioned MPEG-AAC standard. 

25 

Fig. 2 shows a detailed block diagram of watermark means 20 
of Fig. 1 . At a source 30 for information units, informa- 
tion units, preferably in the form of bits, are fed into 
means 32 for spreading. Means 32 for spreading is basically 

30 based on a spread spectrum modulation, which is especially 
favourable by using a pseudo noise spread sequence for a 
correlation in the watermark extractor. The information 
will be combined with the spread sequence bit-by-bit. The 
combining preferably takes place so that, for an informa- 

35 tion bit with a logic level of +1, the spread sequence will 
be generated unchanged at the output of means 32, while for 
an information bit with a logic level of 0, which can, for 
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5 example, correspond to a voltage level of -1, the inverse 
spread sequence is generated at the output of a means 32 . 
Thereby, a "time signal" is generated at the output of 
means 32, which comprises the spread information from the 
source 30 for information. This spread information signal 

10 will then be transferred into its spectral representation 
by means 34 for transforming, which can be a FFT algorithm, 
a MDCT, etc., but also a filter bank. The spectral repre- 
sentation of the spread information signal will be weighed 
in means 36 in order to then be added to the spectral val- 

15 ues in means 38 in such a way that at the output of means 
38, the sum spectral values will be present which can then 
be quantized 22 and encoded 24 with reference to Fig. 1 in 
order to be fed into the bit stream multiplexer 26. Water- 
mark means 20 further comprises means 40 for establishing 

20 the maskable noise energy for the short-term spectrum, 
which is given through the spectral values. 

It has to be noted that means 34 for transforming the 
spread information signal preferably performs a spectral 

25 transformation corresponding to the transformation underly- 
ing the data stream at the input of the demultiplexer 10 
(Fig. 1). This means that means 34 for transforming pref- 
erably performs the same modified discrete cosine trans- 
form, which has originally been used for generating the 

30 non-processed data stream. This can easily be done, since 
information like, for example, window type, window shape, 
window length, etc., are transmitted as side information in 
the bit stream. This connection is indicated by the broken 
line in Fig. 2 of the bit stream de-multiplexer 10 (Fig. 

35 1) . 
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5 As already explained with reference to Fig. 1, after the 
addition in the summator 38 the sum spectral values will be 
subjected to quantizing and encoding again. The question 
occurs here, as to how the quantizer interval, i.e., the 
quantizer step size which has already been referenced, is 

10 to be determined, i.e., whether the iterations have to be 
performed again or not. Due to the fact that the watermark 
energy is usually very small compared to the audio signal 
energy, the same scale factors as in the original bit 
stream can preferably be used. This is represented in Fig. 

15 1 by the connecting line 14 from de-multiplexer 10 to mul- 
tiplexer 26. This means that quantizing can be performed 
much easier by the quantizer 22, since it is no longer nec- 
essary (but still possible) to carry out the iteration loop 
in order to determine an optimum compromise between bit 

20 rate and quantizer step size. Instead, the scale factors 
already known are preferably used. 

In the following, the various possibilities for establish- 
ing the noise energy maskable by the short-term spectrum 
25 will be described which is needed for weighting the spec- 
tral representation of the spread information signal. Vari- 
ous possibilities exist which, subseguent ly , will be dis- 
cussed with reference to Fig. 3a - 3d. 

30 In Fig. 3a, a psychoacoust ic model is used to compute the 
psychoacoustic masking threshold of the respective short- 
term spectrum by using the spectral values of the audio 
signal. Due to the fact that psychoacoustic models are de- 
scribed in the literature and the standard mentioned, it is 

35 only mentioned here that preferably those psychoacoustic 
models can be used which work with spectral data anyway, or 
include a time/frequency transformation, respectively. In 



this case, the psychoacoustic model is simplified compared 
to the original psychoacoustic model, which underlies every 
encoder in that the same can be "fed" immediately with 
spectral values, so that no frequency/time transformation 
is required in the psychoacoustic model at all. Finally, 
the psychoacoustic model will output the psychoacoustic 
masking threshold for the short-term spectrum, such that in 
block 36 (Fig. 2), the spectrum of the spread information 
signal can be shaped, such that it has an energy in every 
scale factor band which is equal to the psychoacoustic 
masking threshold or below the psychoacoustic masking 
threshold in this scale factor band. It has to be noted 
that the psychoacoustic masking threshold is energy. It is 
desired that the spectral representation of the information 
signal is as equal to the psychoacoustic masking threshold 
as possible in order to introduce information into the au- 
dio signal through as much energy as possible in order to 
obtain correlation peaks in an extractor of the watermark 
that are as good as possible. 

The first possibility shown in Fig. 3a has the advantage 
that the psychoacoustic masking threshold can be computed 
very exactly and that this method is fully compatible with 
common data streams. The disadvantage is the fact that the 
computation of a psychoacoustic model can usually be rela- 
tively time-consuming, so that it can be said that this 
possibility is very accurate and interoperable, but does, 
however, take a lot of time. 

Another possibility to obtain the psychoacoustic maskable 
noise energy shown in Fig. 3b consists of writing the psy- 
choacoustic masking threshold for every short-term spectrum 
into the bit stream in the encoder, that has generated the 
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5 data stream at the input of the de-multiplexer 10 (Fig. 1) 
such that the inventive apparatus for introducing informa- 
tion into a data stream merely needs to extract (40b) the 
psychoacoustic masking threshold for each short-term spec- 
trum from the side information of the data stream in order 

10 to output the psychoacoustic masking threshold to means 36 
for weighting the spectral representation of the spread in- 
formation signal (Fig. 2) . This possibility has the advan- 
tage that it is also very exact and, apart from that, very 
fast, since it only has to be accessed and not computed, 

15 but the interoperability is effected, i.e., standard bit 
streams cannot be provided with a watermark later, since 
they do not contain psychoacoustic masking thresholds. 
Therefore an inventive special encoder as described in Fig. 
4 is needed here. 

20 

Another possibility for establishing the psychoacoustic 
maskable noise energy is shown in Fig. 3. Here, the psycho- 
acoustic maskable noise energy is computed (40c) by using 
the spectral values and the scale factors. It is assumed 

25 that the original encoder that has generated the data 
stream which has to be introduced into the watermark, has 
already chosen the noise energy introduced by guantizing, 
such that it is below the psychoacoustic masking threshold 
or egual to the psychoacoustic masking threshold, respec- 

30 tively. This method is slightly less exact than the direct 
computing of the psychoacoustic masking threshold, but in 
comparison to direct computing of the psychoacoustic mask- 
ing threshold it is, however, very fast and also maintains 
the interoperability, i.e., functions also together with 

35 standard bit streams. 



In the following, it will be addressed as to why the third 
possibility is a slightly less exact. Several encoding ap- 
proaches exist which differ, for example, in the quantizer 
implementations being used. As it has already been de- 
scribed, a quantizer may not exceed the specified bit rate. 
On the other hand, he has to maintain the psychoacous t ic 
masking threshold. That way, it can happen that a quantizer 
does not need the available bit rate at all, since, for ex- 
ample, a high bit rate is present or when a piece of music 
having a very high encoding gain has to be encoded as is 
the case with tonal pieces, for example. Certain quantizers 
function so that they quantize finer than necessary and, 
thus, introduce much less noise energy into the audio sig- 
nal through quantizing than they would be allowed to. It 
is, therefore, reasonable that the inventive apparatus as 
described in Fig. 3c assumes that the psychoacoustic mask- 
ing threshold is much lower than it actually would be al- 
lowed to be, which finally leads to the fact that the spec- 
tral representation of the spread information signal after 
weighting has much less energy than it would be allowed to 
have, whereby not all of the available energy that the wa- 
termark is allowed to have, is used. This would, however, 
not be the case when a quantizer is used which always in- 
troduces the maximum allowable noise energy during quantiz- 
ing and does not write to eventually remaining bits or 
fills them with any values not taken into consideration 
during decoding. In this case, the option illustrated in 
fig. 3c would be exactly the same as the first two possi- 
bilities. In the case of the variable quantizer, however, a 
variable bit rate is created as well. In this case, the wa- 
termark means could also be used to make the bit rate con- 
stant by filling up bits representing the watermark, so 
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5 that the constant bit rate is the same as the highest bit 
rate of the original data stream with variable bit rates. 

In the following, it will be addressed how the noise energy 
which has been introduced by quantizing into a scale factor 
10 band will be computed by using the spectral values and the 
scale factors and above that the characteristic of quantiz- 
ing. Here, the following equation for the energy Fxi of the 
quantizing fault for a spectral value x x applies. 

15 |Fxi| 2 = (q 2 7l2a 2 )- Xl 2(1 - a) 

It has to be noted that this equation applies to irregular 
quantizers as they are provided, for example, with the 
standard MPEG-AAC. For regular quantizers, the second term 
20 would simply be dropped, when 1 is inserted for a. 

The factor q appearing in the equation is linked to the 
quantizer step size QS as follows: 

25 q = 2 QS/4 

The factor a is 3/4 for the MPEG-AAC quantizer. 

The energy of the quantization error in a scale factor band 
30 is then the sum of | Fxi | 2 in a scale factor band. This en- 
ergy has to be smaller than or equal to the psychoacoust ic 
masking threshold in this scale factor band in order to be 
inaudible. It has to be noted that the psychoacoust ic mask- 
ing threshold in a scale factor band is constant, but takes 
35 different values for different scale factor bands. For the 
energy of the quantization error x min , the following value 
results : 
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5 

xmin = Xn2 3/8 - QS )/(27/4) • x^ ' 2 ] 

i 

The index i is to show that summing always has to be done 
using the spectral values in the scale factor band, since 
10 the psychoacoustic masking threshold is usually given as 
energy for this scale factor band. 

It has to be noted that in the side information of the data 
stream, the quantizer step sizes for the individual scale 

15 factors are not given directly, but, however, according to 
agreement as specified in the AAC standard, the quantizer 
step size, which is associated to every scale factors, can 
be uniquely derived. Apart from that, the characteristic of 
the quantizer used in the original encoder for generating 

20 the data stream has to be known, i.e., if it is an irregu- 
lar quantizer , its compression factor, which is the factor 
3/4 in the AAC standard. 

As already discussed, the spectral lines of the spectral 
25 representation of the spread information signal will now be 
weighted so that, together, they have an energy that is 
smaller than or equal to the psychoacoustic maskable noise 
energy and, in the case of the option described in Fig. 3c, 
equal to the noise energy of the quantizing process. 

30 

Considering the case that the noise energy introduced by 
quantizing in the scale factor band is already equal to the 
psychoacoustic masking threshold and then the same energy 
is introduced into the audio signal again, but only for the 
35 information to be introduced, then it can be seen that all 
the energy, i.e., the noise energy due to quantizing and 
the energy for the information can exceed the psychoacous- 
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5 tic masking threshold, which can lead to audible quality 
losses, which will, however, be small due to the limitation 
of the energy of information to the psychoacoust ic masking 
threshold, since the psychoacoust ic masking threshold will 
be violated by a factor larger than 1. As already ex- 

10 plained, a watermark energy in the order of the psycho- 
acoustic masking threshold will lead to interferences when 
the quantizing noise is already in the order of the psycho- 
acoustic masking threshold. It is, therefore, preferred to 
chose the psychoacoust ic maskable noise energy which will 

15 be weighted such that all the noise energy (quantizing 
noise plus "noise energy" of information) is smaller than 
1,5 times the psychoacoust ic masking threshold, wherein 
even smaller factors up to close to 1,0 are possible. It 
has to be noted that small factors are also practical, 

20 since very high information redundancy has already been in- 
troduced due to the spreading of the information signal. 

In other words, introducing a watermark into an audio sig- 
nal whose psychoacoustic masking threshold has already been 
25 fully used up by noise energy due to quantizing leads to a 
lesser deterioration of the audio quality, which will, how- 
ever, be slightly cancelled by the advantages of the water- 
mark. 

30 In order to overcome this limitation, the concept shown in 
Fig. 3d can be used, wherein the quantizer in the encoder 
is controlled from the beginning, such that the noise en- 
ergy introduced by quantizing is chosen by setting the 
quantizer step size, such that it always stays below the 

35 psychoacoustic masking threshold by a predetermined amount. 
In other words, an audio encoder for such a concept works 
such that it quantizes finer than necessary, whereby an 
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5 "energy potential" for the information to be introduced, 
i.e., for the watermark, is kept free. This has the advan- 
tage that a watermark can be fully introduced without qual- 
ity loss when, in establishing the psychoacoust ic maskable 
noise energy (40d), which is now smaller than the psycho- 

10 acoustic masking threshold by a predetermined amount, the 
predetermined value is considered in means 40d, so that the 
noise energy due to quantizing and the energy due to the 
information to be introduced are together equal to or 
smaller than the psychoacoust ic masking threshold. Since 

15 the weighted spectral values of the spread information sig- 
nals are summed with the spectral values of the audio sig- 
nal, the spectral values of the information signal are, af- 
ter their weighting, equal to or smaller than the predeter- 
mined amount. 

20 

This option has the advantage that a watermark can be in- 
troduced into a data stream without any quality loss, but 
that, however, on the one hand, the interoperability suf- 
fers and, since the quantizer in the encoder always has to 
25 stay below the psychoacoust ic masking threshold by the pre- 
determined amount when setting the noise energy by quantiz- 
ing. On the other hand, this implementation possibility is 
very efficient, since no psychoacoust ic model has to be 
computed . 

30 

In the following, reference is made to Fig. 4 wherein Fig. 
4 shows two possibilities for an encoder for audio signals 
to generate a data stream, which is especially suitable for 
introducing information according to the invention. Such an 
35 audio encoder can, basically, be constructed like a known 
audio encoder such that it comprises means 50 for generat- 
ing a spectral representation of the audio signal, a quan- 
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5 tizer 52 for quantizing the spectral representation of the 
audio signal, an entropy encoder 54 for entropy encoding 
the quantized spectral values and, finally, a data stream 
multiplexer 56. The data stream output by the data stream 
multiplexer 56 receives, by an also-known psychoacous t ic 

10 model 58, the psychoacous tic masking threshold via the data 
stream multiplexer 56, which is, in contrary to a known au- 
dio encoder, written into the data stream, such that the 
inventive apparatus for introducing information can simply 
access the psychoacoust ic masking threshold in the data 

15 stream. The encoder shown in Fig. 4 by a solid line 60 is 
therefore the counterpart to the apparatus shown in Fig. 1 
for introducing information including the option shown in 
Fig. 3b, as means for establishing maskable noise energy. 

20 The audio encoder means according to the present invention 
is shown in Fig. 4 in dashed lines corresponding to the op- 
tion for means 40 shown in Fig. 3d for establishing the 
maskable noise energy in the inventive apparatus shown in 
Fig. 1. Here, the quantizer is controlled by a predeter- 

25 mined amount, such that the noise energy introduced by 
quantizing is below the psychoacoust ic masking threshold by 
the predetermined amount, wherein the value of the prede- 
termined amount is fed into the data stream multiplexer 56 
via the dotted line 62 in order to be comprised within the 

30 data stream such that the inventive apparatus for introduc- 
ing information can access the predetermined amount in or- 
der to weight respectively (block 36 in Fig. 2) . 
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Claims 

Method for introducing information into a data stream 
including data about spectral values representing a 
short-term spectrum of an audio signal, including: 

processing ( 10 , — t&r — 1-8-) — the data stream to obtain the 
spectral values of the short-term spectrum of the au- 
dio signal; 

combining ( 32 ) — the information with a spread sequence 
to obtain a spread information signal; 

generating ( 3 4 ) a spectral representation of the 

spread information signal to obtain a spectral spread 
information signal; 

establishing ( 4 0a; 40b; 4 0c; 4 0d) psychoacoust ic 

maskable noise energy as function of frequency for the 
short-term spectrum of the audio signal, wherein the 
psychoacoustic maskable noise energy is smaller or the 
same as the psychoacoustic masking threshold of the 
short-term spectrum; 

weighting (36) — the spectral spread information signal 
by using the established noise energy to generate a 
weighted information signal, wherein the energy of the 
introduced information is substantially equal to or 
below the psychoacoustic masking threshold; 

summing ( 38 ) — the weighted information signal with the 
spectral values of the short-term spectrum of the au- 
dio signal to obtain sum spectral values including the 
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5 short-term spectrum of the audio signal and the infor- 

mation; and 

processing ( 22 , — 2A-, — 2-6-) — the sum spectral values to ob- 
tain a processed data stream including the data about 
10 the spectral values of the short-term spectrum of the 

audio signal and the information to be introduced. 

2. Method according to claim 1, wherein the data stream 
comprises quantized spectral values as data about 
15 spectral values, the step of processing of the data 

stream including the following sub-step: 

inverse quantizing ( 18 ) — the quantized spectral values 
to obtain the spectral values; and 

20 

the step of processing the summed spectral values in- 
cluding : 

quantizing ( 22 ) the sum spectral values to obtain 

25 quantized sub-spectral values; and 

forming (26) — the processed data stream using the quan- 
tized sum spectral values. 

30 3. Method according to claim 2 wherein the quantized 
spectral values in the data stream are entropy en- 
coded, the step of processing the data stream includ- 
ing the following sub-step: 



35 



entropy-decoding ( 18 ) the entropy-encoded spectral 

values to obtain the quantized spectral values; and 
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the step of processing the sum spectral values includ- 
ing : 

entropy-encoding (2 4 ) — the quantized sum spectral val- 
ues . 

Method according to one — e-# — feh-e — previous — claims claim 1 , 
wherein the step of establishing the psychoacoust ic 
maskable noise energy comprises: 

computing (40a) — the psychoacoust ic masking threshold 
as function of frequency using a psychoacoust ic model, 
which is based on the spectral values of the audio 
signal . 

Method according to e«e — ef- — the — cla i m s — 1 — fee — g -claim 1 , 
wherein a masking threshold used in generating the 
data stream as function of frequency for the short- 
term spectrum is present in the data stream as side 
information, the step of establishing including: 

extracting (40b) — the psychoacoust ic masking threshold 
from the data stream, wherein the psychoacoust ic 
maskable noise energy is the same as the psychoacous- 
tic masking threshold. 

Method according to e-ne — of- — th-e — claims — 1 — t-e — 3 claim 1 , 
wherein the data stream further comprises side infor- 
mation including scale factors ( 1 4 ) — by which the spec- 
tral values will be multiplied in groups in an audio 
encoder prior to quantizing, the step of processing 
the data stream further including the following sub- 
step : 



extracting the scale factors from the data stream; and 
the step of establishing including: 

computing the noise energy introduced into the audio 
encoder when quantizing as function of frequency by 
using the scale factors for the short-term spectrum 
and by using the spectral values as well as knowing a 
quantizer used in the audio encoder, the introduced 
noise energy being a measure for the psychoacoust ic 
maskable noise energy used in weighting. 

Method according to claim 6, wherein the data stream 
is formed according to ISO/IEC 13818-7 (MPEG-2 7AAC) 
and the step of estimating the noise energy comprises: 

establishing a quantizing step for the spectral values 
from a scale factor band using the scale factor asso- 
ciated with this scale factor band; 

evaluating the following formula to obtain the noise 
energy for the scale factor band introduced by quan- 
tizing, 

x min = £ [ (2 3 7 8 ' QS ) / (27 / 4) • ' 2 ] 
i 

wherein x x is the i-th spectral line in a scale factor 
band, QS is the quantizing step for this scale factor 
band and xmin is the noise energy introduced in the 
scale factor band by quantizing; 



the step of weighting (36) — including: 
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setting the spectral values of the spectral represen- 
tation of the spread information signal in the scale 
factor band such that the total energy of the set 
spectral values is the same as the noise energy in 
10 this scale factor band obtained in the step of evalu- 

ating . 



Method according to one — ef — the — claims — 1 — fe-e — 3 -claim 1 , 
wherein the spectral values of the data stream are 
quantized such that the noise energy introduced by 
quantizing is smaller than the psychoacoustic masking 
threshold by a predetermined amount and wherein, in 
the step of establishing ( 4 0d) — an energy corresponding 
to the predetermined amount is established; and 

wherein in the step of weighting (36) the spectral 

values of the spectral representation of the spread 
information signal are set such that they have an en- 
ergy corresponding to the predetermined amount. 

Method according to claim 1, wherein the value of the 
predetermined amount is present as side information in 
the data stream, in the step of establishing ( 4 Od) — the 
value for the predetermined amount will be extracted 
from the side information of the data stream. 



10. Method according to one — of the previous claimo claim 1 , 
wherein in the step of processing the sum spectral 
values, the same quantizing step sizes as in the 
35 original data stream are used. 



Method f< 



iding an audio signal, — including 
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generating (-50-) a — short-term — opectrum — of — the — audi o 

signal — including — a plurality of — spectral — values ; 

computing — the — poychoacoust ic masking — threshold — e-f — the 
audio signal using a psychoacoust ic model — (58) ; 

quantizing (-52-) the — spectral — values — considering — the 

psychoacoustic — masking — threshold, se — that — the — noise 



rgy introduced by quantizing is equal to 



than the psychoacoustic masking threshold; — aed 

forming (56) a bit stream including values correspond ■ 
ing to the quantized spectral values of the short - term 
spectrum and — additionally — including — the — computed poy - 
choacoustic masking threshold — — for the short term 
spectrum of the audio signal. 

Method for encoding an audio signal including: 

generating ( 50) a short-term spectrum of the audio 

signal including a plurality of spectral values; 

computing the psychoacoustic masking threshold of the 
audio signal using a psychoacoustic model — (58) ; 

quantizing the spectral values considering the psycho- 
acoustic masking threshold so that the noise energy 
introduced by quantizing is smaller than the psycho- 
acoustic masking threshold by a predetermined amount; 
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forming ( 56 ) — a bit stream including values correspond- 
ing to the quantized spectral values of the short-term 
spectrum. 

-3-3-12 . Method according to claim 12, wherein in the step 

of forming an indication for the value (62) of the 
predetermined amount is included in the bit stream. 

-3-4-1 3 . Apparatus for introducing information into a data 

stream including data about spectral values represent- 
ing a short-term spectrum of an audio signal, includ- 
ing : 

means a processor for processing ( 10 , — t&r — 18-) — the data 
stream to obtain the spectral values of the short-term 
spectrum of the audio signal ; 



means — a combiner for combining ( 32 ) — the information 
with a spread sequence to obtain a spread information 
signal ; 



means a generator for generating (3 4 ) — a spectral rep- 
resentation of the spread information signal to obtain 
a spectral spread information signal; 



—an establisher for establishing ( 4 0a; — 4 Ob; — 4-0-e 



4 0d) psychoacoust ic maskable noise energy as function 
of the frequency for the short-term spectrum of the 
audio signal, wherein the psychoacoust ic maskable 
noise energy is smaller than or equal to the psycho- 
acoustic masking threshold of the short-term spectrum; 
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5 means a weighter for weighting (36) the spectral 

spread information signal by using the established 
noise energy to generate a weighted information sig- 
nal, wherein the energy of the introduced information 
is substantially equal to or below the psychoacoust ic 
10 masking threshold; 



means a summer for summing (38 ) — the weighted informa- 
tion signal with the spectral values of the short-term 
spectrum of the audio signal to obtain spectral values 
15 including the short-term spectrum of the audio signal 

and the information; and 



means — another processor for processing (22, — 2-4-7 — 2-6-}- 
the sum spectral values to obtain a processed data 
20 stream including the data about the spectral values of 

the short-term spectrum of the audio signal and the 
information to be introduced. 



■ir&-. Means — for encoding 



Ftq — f-5-0-) — a short - term spectr 



ignalo — including — a — plurality — ef- — opectr 



m e ans — for computing a psychoacoust ic masking threshold 
e~f — the — audio — signal — by — using — a — psychoacoust ic — model 
(58) . 



moans — f-e-3? — quantizing — (52) — th-e — spectral — values — consid - 
35 cring — the — psy c h o acoustic — masking — threshold — s-e — that 
noise — energy — introduced — by — quantizing — i-s — e qu al — fee) — e^r 
smaller than the psychoacoustic masking threshold; — and 
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means fo 




forming (56) a 


bit stream including values 
Ltizod spectral values of the 


corrcopor 




tfug — to — the — quan 




short toj 




— opectrum — and, 


additionally — including — fcfee 


computed 
ohort toi 


-p: 


jychoacouotic m. 
opectrum of thi 


joking threshold (60) for the 
3 audio signal. 



-1-614 . Means — Apparatus for encoding an audio signal, 
including : 

15 means — a generator for generating ( 50 ) a short-term 

spectrum of the audio signal including a plurality of 
spectral values; 

means — a calculator for computing a psychoacoustic 
20 masking threshold of the audio signal using a psycho- 

acoustic model — (58) ; 

means — a quantizer for quantizing spectral values con- 
sidering the psychoacoustic masking threshold so that 
25 the noise energy introduced by quantizing is smaller 

than the psychoacoustic masking threshold by a prede- 
termined amount; 



30 



a bitstream formatter m eans — for forming (56) a bit 

stream including values corresponding to the quantized 
spectral values of the short-term spectrum. 



:l tj O l-t SO « * j I'ji o :/ o Z: 

- 42 - 



5 Method and Apparatus for Introducing Information into a 

Data Stream and a Method and Apparatus for Encoding an Au- 
dio Signal 



10 Abstract 

An inventive method for introducing information into a data 
stream including data about spectral values representing a 
short-term spectrum of an audio signal first performs a 

15 processing of the data stream to obtain the spectral values 
of the short-term spectrum of the audio signal. Apart from 
that, the information to be introduced are combined with a 
spread sequence to obtain a spread information signal, 
whereupon a spectral representation of the spread informa- 

20 tion is generated which will then be weighted with an es- 
tablished psychoacoustic maskable noise energy to generate 
a weighted information signal, wherein the energy of the 
introduced information is substantially equal to or below 
the psychoacoustic masking threshold. The weighted informa- 

25 tion signal and the spectral values of the short-term spec- 
trum of the audio signal will then be summed and afterwards 
processed again to obtain a processed data stream including 
both audio information and information to be introduced. By 
the fact that the information to be introduced are intro- 

30 duced into the data stream without changing to the time do- 
main, the block rastering underlying the short-term spec- 
trum will not be touched, so that introducing a watermark 
will not lead to tandem encoding effects. 
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Field of the Invention 

The present invention relates, in general, to audio signals 
and, in particular, to introducing information into a data 
stream having spectral values that represent a short-term 
spectrum of an audio signal. Especially in the field of 
copyright protection for audio signals, the present inven- 
tion serves to introduce copyright information, for exam- 
ple, into an audio signal as inaudible as possible. 



Background of the Invention and Prior Art 

With the increasing distribution of the Internet, music pi- 
racy has also drastically increased. At many locations on 
the Internet, of music or, in general, audio signals can be 
downloaded. Copyrights are only considered in very few 
cases. Particularly, the authorisation of the author is 
very rarely obtained as to whether he wants to offer his 
work or not. Fees occurring are rarely paid to the author 
for lawful copying. Apart from that, an uncontrolled copy- 
ing of works takes place which, in most cases, also happens 
without consideration of copyrights. 

When music is lawfully purchased from a provider of music 
via the Internet, the provider usually produces a header in 
which copyright information as well as, for example, a cus- 
tomer ID are introduced, the customer ID uniguely referring 
to the present purchaser. It is further known to introduce 
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5 copy allowance information into that header, which signal 
the diverse types of copyrights, for example, that the 
copying of the current piece is completely forbidden, that 
the copying of the current piece is only allowed once, that 
the copying of the current piece is totally free, etc. 

10 

The customer has a decoder that reads in the header, and 
that, in compliance with the allowed actions, for example, 
only allows one copy and refuses further copies. 

15 This concept for consideration of copyrights, however, only 
works for customers who behave legally. 

Illegal customers usually have a significant potential of 
creativity to "crack" pieces of music that are provided 

20 with a header. The disadvantage of the described procedure 
for the protection of copyrights is shown here. Such a 
header can be removed easily. Alternatively, an illegal 
user could also modify individual entries in the header, 
for example, to change the entry "copying forbidden" to an 

25 entry "copying totally free". It is also a possible case 
that an illegal customer removes his own customer ID from 
the header and then offers the piece of music on his or an- 
other Homepage in the Internet. From that moment onwards, 
it is no longer possible to identify the illegal customer, 

30 since he has removed his customer ID. Attempts to prevent 
such violations of the copyright will, therefore, inevita- 
bly be useless, since the copy information has been removed 
from the piece of music or has been modified and, since the 
illegal customer who has done that, cannot be identified 

35 anymore to call him to account. If, instead, a secure in- 
troduction of information into the audio signal were exis- 
tent, then government authorities who prosecute copyright 
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5 violations could trace suspicious pieces of music in the 
Internet and, for example, could establish the user identi- 
fication of such illegal pieces in order to put a stop to 
the illegal users. 

10 From WO 97/33391, an encoding method for introducing an in- 
audible data signal into an audio signal is known. There, 
the audio signal into which the inaudible data signal is to 
be introduced is converted into the frequency area in order 
to determine the masking threshold of the audio signal us- 

15 ing a psychoacoust ic model. The data signal to be intro- 
duced into the audio signal is multiplied with a pseudo 
noise signal in order to create a frequency-spread data 
signal. The frequency-spread data signal is then weighted 
with a psychoacoustic masking threshold, such that the en- 

20 ergy of the frequency-spread data signal will always be be- 
low the masking threshold. Finally, the weighted data sig- 
nal is superimposed on the audio signal, whereby an audio 
signal is created in which the data signal is inaudibly in- 
troduced. On the one hand, the data signal can be used to 

25 establish the range of a transmitter. On the other hand, 
the data signal can be used for the identification of audio 
signals in order to easily identify possible pirate copies, 
since every sound carrier, for example, a compact disc, is 
provided with an individual identification ex works. Fur- 

30 ther described possibilities for the application of the 
data signal is the remote control of audio devices, analo- 
gous to the "VPS" method on television. 

This method is highly secured against music pirates, since; 
35 on the one hand, they are probably not aware that the piece 
of music that they are copying is identified. Apart from 
that, it is almost impossible to extract the data signal, 



5 which is inaudibly present in the audio signal without an 
authorised decoder. 

Audio signals are 16 bit PCM samples, when they come from a 
compact disc. A music pirate could, for example, manipulate 

10 the sampling rate or the levels or phases of samples to 
make the data signal unreadable, i.e., undecodable, whereby 
the copyright information would also be removed from the 
audio signal. This, however, will not be possible without 
significant quality losses. Data that are introduced into 

15 audio signals in such a way can therefore, analogous to 
bank notes, also be referred to as "watermarks'''. 

The method described in WO 97/33391 for introducing an in- 
audible data signal into an audio signal works by using the 

20 audio samples that are present as time domain samples. 
Thereby, it is necessary that audio pieces, i.e., pieces of 
music, radio plays, etc., have to be present as a sequence 
of timely samples in order to be provided with a watermark. 
This has the disadvantage that this method cannot be used 

25 for already-compressed data streams that have been proc- 
essed, for example, according to one of the MPEG methods. 
This means that a provider of pieces of music who wants to 
provide the pieces of music with a watermark prior to ship- 
ment to the customer has to store the pieces of music as a 

30 sequence of PCM samples. This leads to the provider for mu- 
sic needing to have a very high storage capacity. However, 
it would be desirable to use the very-effective audio com- 
pressing method already for storing the audio data at the 
provider . 

35 

A provider for audio data of the above-described type 
could, of course, simply compress all pieces of music, for 



5 example, by using the standards MPEG-2 AAC 13818-7 and then 
decompress them fully again before the audio piece is to be 
provided with a watermark, in order to have a sequence of 
audio samples again that will then be fed into a known ap- 
paratus for introducing an inaudible data signal in order 

10 to introduce a watermark. This needs a significant effort 
in that prior to the introduction of information into the 
audio signal, a full decompression or decoding is neces- 
sary. Such a decoding costs time and money. However, a much 
more serious feature is the fact that in such a procedure, 

15 tandem encoding effects occur. 

A further disadvantage of this procedure is that due to the 
fact that the watermark is introduced into the PCM data, 
there is no security as to whether the watermark is still 

20 present after an audio compression. When PCM data provided 
with watermarks and having a relatively low bit rate and 
are encoded, the encoder introduces a lot of quantizing 
noise when quantizing due to the relatively low bit rate, 
which will, in an extreme case, lead to the fact that no 

25 watermark can be decoded anymore. It is also problematic 
that with this procedure, the bit rate of the audio encoder 
that encodes the PCM data provided with watermarks is not 
known previously and that is why no secure control of the 
ratio between watermark energy and noise energy due to the 

30 quantizing noise is possible. 

It is known that audio encoding methods according to one of 
the MPEG standards are no loss-less encoding methods, but 
lossy encoding methods. Bit savings in comparison to direct 
35 transmission of audio samples in the time domain are 
achieved, to a large part, by making use of psychoacoust ic 
masking effects. Particularly, for a block of, for example, 
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5 2048 audio samples, the psychoacoust ic masking threshold 
will be established as a function of frequency, whereupon, 
after a time frequency transformation of the audio samples 
the quantizing of spectral values including the short-term 
spectrum will be carried out under consideration of this 

10 psychoacoust ic masking threshold. In other words, the quan- 
tizer step size is controlled, such that the noise energy 
introduced by quantizing is smaller or equal to the psycho- 
acoustic masking threshold. In areas of the audio signal 
where the masking index, i.e., the ratio of audio signal 

15 energy to the psychoacoust ic masking threshold is very 
small, like, for example, in very noisy areas of the audio 
signal, the spectral values need to be only roughly quan- 
tized, without audible interferences occurring after a sub- 
sequent decoding. In other areas where the audio signal is 

20 very tonal, it has to be quantized more finely, such that 
relatively small noise energy results due to the quantiz- 
ing, since the masking index is very large. 

It becomes clear from the above that due to the quantizing 
25 procedure, information of the original audio signal gets 
lost. This does not matter when the quantized audio signal 
is decoded again, since the noise energy due to the quan- 
tizing has been distributed in such a way that it remains 
below the psychoacoust ic masking threshold and will, there- 
30 fore, be inaudible when an ideal psychoacoust ic model has 
been used. These considerations, however, always only apply 
for a certain short-term spectrum or for a block of, for 
example, 2048 subsequent audio values, respectively. After 
the decoding, the block of audio samples does, however, 
35 comprise no more information about how the block building 
was performed. When the known apparatus for introducing in- 
formation has been used which, in most cases, has a certain 
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5 delay compared to an audio encoder that does not introduce 
information, it can therefore not be assumed that the same 
block partitioning takes place accidentally. Instead, the 
block partitioning, the short-term spectrum creation and 
the quantizing will take place in a totally different block 

10 raster. A renewed decoding will then usually lead to 
clearly audible interferences, since it does not refer to 
the same short-term spectrum, but to different short-term 
spectrums. This appearance of audible interferences through 
two encoding/decoding stages due to their different parti- 

15 tioning of the stream of audio samples into blocks is re- 
ferred to as tandem encoding effect. 

It should be noted that in general by introducing the inau- 
dible data signal, noise energy is introduced into the au- 

20 dio signal, which already includes noise energy due to the 
uninfinitely fine quantizing procedure. Introducing the in- 
audible data signal therefore has a tendency to lead to a 
deterioration of the audio quality unless special precau- 
tions will be taken. In this connection, a further intro- 

25 duction of noise energy due to the tandem encoding effects 
previously described is therefore even less desirable, 
since this quality loss appears systematically without any 
benefit, while small quality deteriorations due to the wa- 
termarks are more acceptable, since the watermark also has 

30 an advantage. Tandem encoding effects, however, only cause 
interferences, but have no advantage at all. 

U.S. Patent No. 5,687,191 discloses a concept for transmit- 
ting hidden data after data compression. An audio signal is 
35 transferred into sub-band samples via a sub-band encoder, 
wherein each sub-band filter generates a sequence of timely 
samples whose spectral bandwidth is the same as the band- 
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5 width of the respective sub-band filter. A data stream with 
such quantized sub-band samples will be unpacked and de- 
multiplexed in order to perform an inverse quantizing, such 
that sub-band samples will be present again. Further, a 
pseudo noise spread sequence is filtered by a sub-band fil- 

10 ter bank to obtain a sequence of timely sub-band samples 
for every filter of the sub-band filter bank having a band- 
width determined by the respective sub-band filter. The da- 
ta to be transported will be subjected to a forward error 
correction and a performance control securing that the aux- 

15 iliary data signal is below the noise quantizing floor of 
the audio sub-band samples. The so processed auxiliary data 
values will then be connected with respective sub-band val- 
ues of the pseudo noise spread sequence via respective mo- 
dulators and then XORed with the unpacked sub-band values 

20 of the audio signal. The so obtained combined sub-band val- 
ues will then be quantized again and packed, in order to 
obtain an output data stream. 

25 Summary of the Invention 

It is the object of the present invention to provide a con- 
cept that makes it possible to provide audio pieces with a 
watermark, while the effects of the watermark to the audio 
30 quality should be as low as possible. 

In accordance with a first aspect of the invention, this 
object is achieved by a method for introducing information 
into a data stream including data about spectral values 
35 representing a short-term spectrum of an audio signal, in- 
cluding: processing the data stream to obtain the spectral 
values of the short-term spectrum of the audio sig- 
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5 nal ; combining the information with a spread sequence to ob- 
tain a spread information signal ; generating a spectral rep- 
resentation of the spread information signal to obtain a 
spectral spread information signal; establishing psycho- 
acoustic maskable noise energy as function of frequency for 

10 the short-term spectrum of the audio signal, wherein the 
psychoacoustic maskable noise energy is smaller or the same 
as the psychoacoustic masking threshold of the short-term 
spectrum; weighting the spectral spread information signal 
by using the established noise energy to generate a 

15 weighted information signal, wherein the energy of the in- 
troduced information is substantially equal to or below the 
psychoacoustic masking threshold; summing the weighted in- 
formation signal with the spectral values of the short-term 
spectrum of the audio signal to obtain sum spectral values 

20 including the short-term spectrum of the audio signal and 
the information; and processing the sum spectral values to 
obtain a processed data stream including the data about the 
spectral values of the short-term spectrum of the audio 
signal and the information to be introduced. 

25 

In accordance with a second aspect of the invention, this 
object is achieved by a method for generating a short-term 
spectrum of the audio signal including a plurality of spec- 
tral values, comprising; computing the psychoacoustic mask- 

30 ing threshold of the audio signal using a psychoacoustic 
model; quantizing the spectral values considering the psy- 
choacoustic masking threshold so that the noise energy in- 
troduced by quantizing is smaller than the psychoacoustic 
masking threshold by a predetermined amount; forming a bit 

35 stream including values corresponding to the quantized 
spectral values of the short-term spectrum. 
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5 In accordance with a third aspect of the invention, this 
object is achieved by a Apparatus for introducing informa- 
tion into a data stream including data about spectral val- 
ues representing a short-term spectrum of an audio signal, 
including: a processor for processing the data stream to 

10 obtain the spectral values of the short-term spectrum of 
the audio signal; a combiner for combining the information 
with a spread sequence to obtain a spread information sig- 
nal; a generator for generating a spectral representation 
of the spread information signal to obtain a spectral 

15 spread information signal; an establisher for establishing 
psychoacoustic maskable noise energy as function of the 
frequency for the short-term spectrum of the audio signal, 
wherein the psychoacoustic maskable noise energy is smaller 
than or equal to the psychoacoustic masking threshold of 

20 the short-term spectrum; a weighter for weighting the spec- 
tral spread information signal by using the established 
noise energy to generate a weighted information signal, 
wherein the energy of the introduced information is sub- 
stantially equal to or below the psychoacoustic masking 

25 threshold; a summer for summing the weighted information 
signal with the spectral values of the short-term spectrum 
of the audio signal to obtain spectral values including the 
short-term spectrum of the audio signal and the informa- 
tion; and another processor for processing the sum spec- 

30 tral values to obtain a processed data stream including the 
data about the spectral values of the short-term spectrum 
of the audio signal and the information to be introduced 

In accordance with a fourth aspect of the invention, this 
35 object is achieved by a Apparatus for encoding an audio 
signal, including: a generator for generating a short-term 
spectrum of the audio signal including a plurality of spec- 
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5 tral values; a calculator for computing a psychoacoust ic 
masking threshold of the audio signal using a psychoacous- 
tic model; a quantizer for quantizing spectral values con- 
sidering the psychoacoust ic masking threshold so that the 
noise energy introduced by quantizing is smaller than the 
10 psychoacoustic masking threshold by a predetermined amount; 
a bitstream formatter for forming a bit stream including 
values corresponding to the quantized spectral values of 
the short-term spectrum. 

15 The present invention is based on the knowledge that it has 
to be given up to carry out a complete decoding before in- 
serting the watermark. Instead, a data stream including 
spectral values representing a short-term spectrum of an 
audio signal will therefore inventively only be partly "un- 

20 packed" until the spectral values are present. The un- 
packing is, however, not a complete decoding, but only a 
partly decoding where all the information about the block 
forming or the block raster used in the original encoder, 
respectively, is not touched. 

25 

This is achieved by carrying out the inventive method with 
spectral values and not with timely samples. The informa- 
tion, which is to be introduced into the audio signal, will 
be combined with a spread sequence in the sense of a spread 

30 spectrum modulation in order to obtain a spread information 
signal. Afterwards, a spectral representation of the spread 
information signal will be generated, for example, by a 
filter bank, a FFT, a MDCT or similar, in order to obtain a 
spectral spread information signal. Now, a psychoacoustic 

35 maskable interference will be established as a function of 
frequency for the short-term spectrum of the audio signal 
to then weighten the spectral spread information signal by 
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5 using the established noise energy, so that a weighted in- 
formation signal can be generated, the energy of which is 
substantially equal or below the psychoacoustic masking 
threshold. After that, the weighted information signal will 
be added to the spectral values of the short-term spectrum 

10 of the audio signal in order to obtain sum spectral values 
including the short-term spectrum of the audio signals and, 
additionally, the introduced information. Finally, the sum 
spectral values will be processed again in order to obtain 
a processed data stream including the data about the spec- 

15 tral values of the short-term spectrum of the audio signal 
and the information, which has to be introduced. In the 
case of a MPEG-AAC encoder, the processing of the sum spec- 
tral values will, again, include the quantizing and entropy 
encoding, for example, by using a Huffman code. 

20 

It is to be noted that, thereby, the block rastering pro- 
vided by the original encoder, which produces the data 
stream, will not be touched. Thereby, no tandem effects 
will occur, that would lead to a loss of audio quality. 

25 Apart from that, it is preferred that with the processing 
happening after the weighting that comprises quantizing, 
the same quantizing step size(s) as in the original bit 
stream s/are used, which has the advantage that the very 
computing intensive iteration loops of the quantizer do not 

30 need to be computed again. Further, no tandem encoding ef- 
fects occur that would otherwise be unavoidable, since in 
the case of a renewed computing, more or less strongly dif- 
fering quantizing step sizes could occur. 

35 The inventive introduction of a watermark directly into a 
data stream enables, for example, the introduction of a 
customer ID during the delivery of the music to a customer, 
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5 since the procedure can be executed on modern personal com- 
puters in multiple real time since, among others, the ex- 
pensive frequency time transformation is not needed, which 
would be needed with a complete decoding. 

10 A further advantage of the present invention is that the 
music provider does not have to store the PCM samples, but 
can store pre-encoded data streams which can offer a factor 
in the order of 12 in storage place and that the provider 
can still introduce customer specific watermarks without 

15 the occurrence of additional tandem encoding effects which 
would lead to an audio quality loss. 

The inventive procedure can easily be implemented, since 
only an additional time/frequency transformation of the 

20 spread information signal is necessary. A further signifi- 
cant advantage is that the inventive method has a good in- 
teroperability, i.e., that standard data streams can be 
processed and that for watermarks according to the known 
methods and for watermarks according to the inventive 

25 method, the same watermark decoder can be used. Finally, it 
is a further advantage that an audio encoder cannot erase 
the watermark anymore, since an exact control of the ratio 
between quantizing noise and watermark energy exists. 

30 It is to be noted that it is, of course, possible to remove 
the watermark illegally when the data stream provided with 
the watermark is decoded and then encoded again, but only 
with a low bit rate. In this case, the noise energy intro- 
duced by the quantizer will exceed the watermark energy, so 

35 that no watermark can be extracted from the audio signal 
anymore. This is not a problem however, since the audio 
quality of the audio signal has decreased so strongly due 
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5 to the high quantizing noise that such a poor audio signal 
does not have to be protected any longer. If the watermark 
in an audio signal is destroyed, then its quality is also 
destroyed . 

10 The psychoacoustic maskable noise energy can be established 
in different ways. The first option is to use a psycho- 
acoustic model for establishing the psychoacoustic maskable 
interference energy, which generates the psychoacoustic 
masking threshold as a function of a frequency from the 

15 short-term spectrum. A plurality of psychoacoustic models 
exists, those psychoacoustic models which work with spec- 
tral values of the short-term spectrum anyway are espe- 
cially advantageous, since these spectral values are di- 
rectly present due to the partly un-packing of the data 

20 stream. However, other psychoacoustic models can be used 
alternatively, which are developed for time domain data 
wherein, here, in contrary to the above-described option, a 
frequency time transformation would be necessary. Although 
the possibility of calculating a psychoacoustic model in 

25 order to obtain the psychoacoustic masking threshold of the 
short-term spectrum is relatively computing time-extensive, 
this possibility does, however, offer the decisive advan- 
tage that no tandem encoding effects will be generated, 
since the block rastering will not be touched. 

30 

Another more favourable option concerning the computing 
time effort for establishing the psychoacoustic maskable 
noise energy is to generate the data stream in such a way 
that it comprises apart from the spectral values and the 
35 usual side information, also the psychoacoustic masking 
threshold as a function of a frequency for every short-term 
spectrum. Establishing the psychoacoustic maskable noise 



5 energy then functions simply by extracting the psychoacous- 
tic masking threshold transmitted in the data stream. With 
this possibility and the possibility described above where 
the psychoacoust ic masking model is computed, the psycho- 
acoustic maskable noise energy is the psychoacoust ic mask- 

10 ing threshold itself. The disadvantage of the method for 
transmitting the psychoacoust ic masking threshold in the 
data stream is the fact that a special audio encoder is 
needed, since the psychoacoust ic masking threshold is not 
transmitted with common audio encoding, but only the spec- 

15 tral values and the respective scale factors. In closed 
systems, however, compatibility to standard data streams is 
not required. Therefore, this option can be implemented 
here with little effort and favourable computing time. 

20 It is another possibility to provide a special audio en- 
coder whose quantizer always functions in such a way that 
the quantizing noise is lower than the psychoacoust ic mask- 
ing threshold by a predetermined amount. This means that 
the encoder is designed so that its quantizer quantizes a 

25 bit finer than he would usually have to, such that addi- 
tional noise energy can be added without any noise being 
audible. This additional noise energy can then be "used up" 
with the introducing of information into the data stream in 
order to introduce the information. In the case of an opti- 

30 mum psychoacoust ic model, this possibility leads to a data 
stream with an introduced watermark that has suffered no 
quality deterioration at all. The disadvantage of this 
method is, like with the direct transmission of the psycho- 
acoustic masking threshold, the fact that this method is 

35 not compatible with common encoders. 
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5 Another possibility for establishing the psychoacous t ic 
maskable noise energy is to establish the noise energy that 
has, in fact, been introduced by the quantizing of the en- 
coder which has generated the data stream and to derive the 
information obtained in weighting. This option assumes that 

10 the encoder has quantized such that the noise energy was 
below the psychoacous t ic masking threshold or only slightly 
above it. This method can use the standard bit streams like 
the method described as the first possibility, since only 
the spectral values and the scale factors that are both 

15 present in the data stream are needed in order to obtain 
the psychoacoustic maskable noise energy. From the scale 
factors, the step size of the quantizer associated to the 
respective scale factor can be established in order to com- 
pute the noise energy introduced into a scale factor band 

20 that is typically equal to the psychoacoustic masking 
threshold or below that. The psychoacoustic maskable noise 
energy for the introduced information used in weighting can 
be the same as the quantizing noise energy, but it can also 
have a factor between greater than zero and smaller than 

25 one, wherein the factor closer to zero leads to less audi- 
ble interferences due to the watermark, but could be more 
problematic in extracting than a factor closer to one. 

30 Brief Description of the Drawings 

Preferred embodiments of the present invention will be dis- 
cussed in detail below with reference to the accompanying 
drawings. They show: 

35 

Fig. 1 a block diagram of an inventive apparatus for 
introducing information into a data stream; 
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5 

Fig. 2 a detailed block diagram of the watermark means 

of Fig. 1 . ; 

Fig. 3a a schematic representation of a method for estab- 
10 lishing the maskable noise energy using the psy- 

choacoustic model; 

Fig. 3b a schematic representation of a method for estab- 
lishing the maskable noise energy when the psy- 
15 choacoustic masking threshold is transmitted in 

the data stream; 

Fig. 3c a schematic representation of a method for estab- 
lishing the maskable noise energy when the noise 
20 energy is estimated with the knowledge of the 

spectral values and the scale factors; 

Fig. 3d a schematic representation of a method for estab- 
lishing the psychoacoust ic maskable noise energy 
25 when energy in the data stream is kept free for 

the watermark; and 

Fig. 4 a block diagram of an inventive audio encoder 
that either writes the psychoacoustic masking 
30 threshold into the data stream or writes the pre- 

determined amount for the method described in 
Fig. 3d into the data stream and whose quantizer 
is controlled respectively. 



35 
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5 

Detailed Description of Preferred Embodiments 

Before the individual Figs, will be referred to in more de- 
tail, the system theoretical background of the present in- 

10 vention will be briefly discussed. In general, the intro- 
duction of information into the audio signal should not 
lead to an audible guality deterioration of the audio sig- 
nal, or only to a barely audible one. In order to ascertain 
as to how much energy the signal representing the informa- 

15 tion to be introduced may have, the masking threshold of 
the audio signal is continuously computed by using a psy- 
choacoustic model. The frequency-selective computing of the 
masking threshold by using, for example, the critical bands 
as well as a plurality of further psychoacoust ic models is 

20 known in the art. As an example, it is referred to the 
standard MPEG2-AAC (ISO/IEC 13818-7) . 

The psychoacoustic model leads to a masking threshold for a 
short-term spectrum of the audio signal. Usually, the mask- 

25 ing threshold will vary across the frequency. As a matter 
of definition, it is assumed that a signal introduced into 
the audio signal will then be inaudible when the energy of 
this signal is below the masking threshold. The masking 
threshold strongly depends on the composition of the audio 

30 signal. Noisy signals have a higher masking threshold than 
very tonal signals. The energy of the signal that is intro- 
duced into the audio signal therefore strongly varies 
across the time. Usually, for decoding the information in- 
troduced into an audio signal, a certain signal/noise ratio 

35 is needed. Thereby, it can happen that with very tonal au- 
dio signal portions, the energy of the additionally intro- 
duced signal will become so low that the signal/noise ratio 
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5 will no longer be sufficient for secure decoding. In such 
areas, a decoder cannot, therefore, correctly decode the 
individual bits anymore. From a system theoretical point of 
view, the introduction of information into an audio signal 
in dependence of the psychoacoust ic masking thresholds can 
10 therefore be seen as the transmitting of a data signal via 
a channel with strongly varying noise energy, wherein the 
audio signal, i.e., the music signal is seen as an inter- 
ference signal. 

15 Fig. 1 shows a block diagram of an inventive apparatus or 
an inventive method for introducing information into a data 
stream including spectral values representing a short-term 
spectrum of an audio signal. The data stream applied to the 
input of a data stream demultiplexer 10 will, if it is 

20 processed according to the above-mentioned MPEG AAC stan- 
dard, generally first be partitioned into spectral values 
on a line 12 and page information on a line 14, wherein 
from the side information, the scale factors should be par- 
ticularly named here. The spectral values that are also en- 

25 tropy encoded after the demultiplexer 10 will then be fed 
into an entropy decoder 16 and then into an inverse quan- 
tizer 18 that generates the spectral values of the audio 
signal representing the short-term spectrum of the same by 
using the quantized spectral values and the associated 

30 scale factors supplied to the inverse quantizer 18 via line 
14. The spectral values will then be fed into watermark 
means 20 generating sum spectral values including the 
short-term spectrum of the audio signal and, apart from 
that, the information to be introduced. These sum spectral 

35 values will then, again, be fed into a quantizer 22 and en- 
tropy encoded in a following entropy encoder 24 in order to 
finally be led to a data stream multiplexer 26 which also 
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5 receives the necessary side information like, for example, 
the scale factors. Then, at the output of the multiplexer 
26, a processed data stream is present which differs from 
the data stream at the input of the demultiplexer 10 in 
that it only has one watermark, i.e., that information has 
10 been introduced into it. 

Before a more detailed reference to Fig. 2 including a de- 
tailed representation of watermark means 20 is discussed, 
for ease of understanding, a MPEG-2 AAC audio encoder is 

15 referred to as it is, for example, described in appendix B 
of the standard ISO/IEC 13818-7 : 1997 (E) as informative 
part. Such an encoder is substantially based on the idea to 
bring the quantizing noise below the so-called psychoacous- 
tic masking threshold, i.e., to hide it. For the transfor- 

20 mation of the audio samples into the frequency domain, 
i.e., for generating the spectral representation of the au- 
dio signal, an analysis filter bank is used which is real- 
ised as an critically-under-sampled DCT ( DCT = discrete co- 
sine transform) and which has a degree of overlapping of 

25 50%. Its purpose is to create a spectral representation of 
the input signal that will finally be quantized and en- 
coded. Thus, together with a respective filter bank in the 
decoder, a synthesis/analysis system is being built. 

30 The psychoacoustic model used in such encoders is based on 
the psychoacoustic phenomenon of masking. Both frequency 
area masking effects and time domain masking effects can be 
modelled that way. The psychoacoustic model provides an 
estimated value for "noise" energy that can be added to the 

35 original audio signal without audible interferences appear- 
ing. This maximum admissible energy is referred to as a 
psychoacoustic masking threshold. 
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5 

The quantizer 22 and the encoder 24 in Fig. 1 will be de- 
scribed below. Typically, more than one spectral lines will 
be quantized with the same quantizer step size. Therefore, 
several adjacent spectral lines will be grouped into so- 

10 called scale factor bands. The quantizer optimises the 
quantizer step size for each scale factor band. The quan- 
tizer step size is determined such that the quantizing 
fault is below or equal to the computed psychoacous t ic 
masking threshold in order to make sure that the quantizing 

15 noise is inaudible. It has to be seen that two limits have 
to be considered and between those, a compromise has to be 
found. On the one hand, the bit consumption should be kept 
as low as possible in order to obtain high compression ra- 
tios, i.e., a high encoding gain. On the other hand, it has 

20 to be made sure that the quantizing noise is below the psy- 
choacoustic masking threshold, so that no interferences are 
audible in the encoded and redecoded audio signals. Typi- 
cally, this optimising method is computed in an iterative 
loop. The result of this loop is a quantizer step size, 

25 clearly corresponding to a scale factor for a scale factor 
band. In other words, the spectral values of the scale fac- 
tor bands will be quantized with a quantizer step size, 
which is clearly allocated to the scale factor responsible 
for the scale factor band. This means that two different 

30 scale factors can also lead to two different quantizer step 
sizes . 

The bit stream is composed by a bit stream multiplexer, 
which mainly fulfils formatting tasks. The data stream that 
35 is a bit stream in the case of a binary system, thus com- 
prises the quantized and encoded spectral values or spec- 
tral coefficients as well as the scale factors and further 
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5 side information which are represented and explained in de- 
tail in the above-mentioned MPEG-AAC standard. 

Fig. 2 shows a detailed block diagram of watermark means 20 
of Fig. 1. At a source 30 for information units, informa- 

10 tion units, preferably in the form of bits, are fed into 
means 32 for spreading. Means 32 for spreading is basically 
based on a spread spectrum modulation, which is especially 
favourable by using a pseudo noise spread seguence for a 
correlation in the watermark extractor. The information 

15 will be combined with the spread seguence bit-by-bit. The 
combining preferably takes place so that, for an informa- 
tion bit with a logic level of +1, the spread seguence will 
be generated unchanged at the output of means 32, while for 
an information bit with a logic level of 0, which can, for 

20 example, correspond to a voltage level of -1, the inverse 
spread seguence is generated at the output of a means 32. 
Thereby, a "time signal" is generated at the output of 
means 32, which comprises the spread information from the 
source 30 for information. This spread information signal 

25 will then be transferred into its spectral representation 
by means 34 for transforming, which can be a FFT algorithm, 
a MDCT, etc., but also a filter bank. The spectral repre- 
sentation of the spread information signal will be weighed 
in means 36 in order to then be added to the spectral val- 

30 ues in means 38 in such a way that at the output of means 
38, the sum spectral values will be present which can then 
be guantized 22 and encoded 24 with reference to Fig. 1 in 
order to be fed into the bit stream multiplexer 26. Water- 
mark means 20 further comprises means 40 for establishing 

35 the maskable noise energy for the short-term spectrum, 
which is given through the spectral values. 
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5 It has to be noted that means 34 for transforming the 
spread information signal preferably performs a spectral 
transformation corresponding to the transformation underly- 
ing the data stream at the input of the demultiplexer 10 
(Fig. 1). This means that means 34 for transforming pref- 

10 erably performs the same modified discrete cosine trans- 
form, which has originally been used for generating the 
non-processed data stream. This can easily be done, since 
information like, for example, window type, window shape, 
window length, etc., are transmitted as side information in 

15 the bit stream. This connection is indicated by the broken 
line in Fig. 2 of the bit stream de-multiplexer 10 (Fig. 
1) • 

As already explained with reference to Fig. 1, after the 

20 addition in the summator 38 the sum spectral values will be 
subjected to quantizing and encoding again. The question 
occurs here, as to how the quantizer interval, i.e., the 
quantizer step size which has already been referenced, is 
to be determined, i.e., whether the iterations have to be 

25 performed again or not. Due to the fact that the watermark 
energy is usually very small compared to the audio signal 
energy, the same scale factors as in the original bit 
stream can preferably be used. This is represented in Fig. 
1 by the connecting line 14 from de-multiplexer 10 to mul- 

30 tiplexer 26. This means that quantizing can be performed 
much easier by the quantizer 22, since it is no longer nec- 
essary (but still possible) to carry out the iteration loop 
in order to determine an optimum compromise between bit 
rate and quantizer step size. Instead, the scale factors 

35 already known are preferably used. 
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5 In the following, the various possibilities for establish- 
ing the noise energy maskable by the short-term spectrum 
will be described which is needed for weighting the spec- 
tral representation of the spread information signal. Vari- 
ous possibilities exist which, subsequently, will be dis- 
10 cussed with reference to Fig. 3a - 3d. 

In Fig. 3a, a psychoacous tic model is used to compute the 
psychoacoustic masking threshold of the respective short- 
term spectrum by using the spectral values of the audio 

15 signal. Due to the fact that psychoacoustic models are de- 
scribed in the literature and the standard mentioned, it is 
only mentioned here that preferably those psychoacoustic 
models can be used which work with spectral data anyway, or 
include a time/frequency transformation, respectively. In 

20 this case, the psychoacoustic model is simplified compared 
to the original psychoacoustic model, which underlies every 
encoder in that the same can be "fed" immediately with 
spectral values, so that no frequency/time transformation 
is required in the psychoacoustic model at all. Finally, 

25 the psychoacoustic model will output the psychoacoustic 
masking threshold for the short-term spectrum, such that in 
block 36 (Fig. 2), the spectrum of the spread information 
signal can be shaped, such that it has an energy in every 
scale factor band which is equal to the psychoacoustic 

30 masking threshold or below the psychoacoustic masking 
threshold in this scale factor band. It has to be noted 
that the psychoacoustic masking threshold is energy. It is 
desired that the spectral representation of the information 
signal is as equal to the psychoacoustic masking threshold 

35 as possible in order to introduce information into the au- 
dio signal through as much energy as possible in order to 
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5 obtain correlation peaks in an extractor of the watermark 
that are as good as possible. 

The first possibility shown in Fig. 3a has the advantage 
that the psychoacoustic masking threshold can be computed 

10 very exactly and that this method is fully compatible with 
common data streams. The disadvantage is the fact that the 
computation of a psychoacoustic model can usually be rela- 
tively time-consuming, so that it can be said that this 
possibility is very accurate and interoperable, but does, 

15 however, take a lot of time. 

Another possibility to obtain the psychoacoustic maskable 
noise energy shown in Fig. 3b consists of writing the psy- 
choacoustic masking threshold for every short-term spectrum 

20 into the bit stream in the encoder, that has generated the 
data stream at the input of the de-multiplexer 10 (Fig. 1) 
such that the inventive apparatus for introducing informa- 
tion into a data stream merely needs to extract (40b) the 
psychoacoustic masking threshold for each short-term spec- 

25 trum from the side information of the data stream in order 
to output the psychoacoustic masking threshold to means 36 
for weighting the spectral representation of the spread in- 
formation signal (Fig. 2) . This possibility has the advan- 
tage that it is also very exact and, apart from that, very 

30 fast, since it only has to be accessed and not computed, 
but the interoperability is effected, i.e., standard bit 
streams cannot be provided with a watermark later, since 
they do not contain psychoacoustic masking thresholds. 
Therefore an inventive special encoder as described in Fig. 

35 4 is needed here. 
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5 Another possibility for establishing the psychoacoust ic 
maskable noise energy is shown in Fig. 3. Here, the psycho- 
acoustic maskable noise energy is computed (40c) by using 
the spectral values and the scale factors. It is assumed 
that the original encoder that has generated the data 

10 stream which has to be introduced into the watermark, has 
already chosen the noise energy introduced by quantizing, 
such that it is below the psychoacoust ic masking threshold 
or equal to the psychoacoust ic masking threshold, respec- 
tively. This method is slightly less exact than the direct 

15 computing of the psychoacoust ic masking threshold, but in 
comparison to direct computing of the psychoacoust ic mask- 
ing threshold it is, however, very fast and also maintains 
the interoperability, i.e., functions also together with 
standard bit streams. 

20 

In the following, it will be addressed as to why the third 
possibility is a slightly less exact. Several encoding ap- 
proaches exist which differ, for example, in the quantizer 
implementations being used. As it has already been de- 

25 scribed, a quantizer may not exceed the specified bit rate. 
On the other hand, he has to maintain the psychoacoust ic 
masking threshold. That way, it can happen that a quantizer 
does not need the available bit rate at all, since, for ex- 
ample, a high bit rate is present or when a piece of music 

30 having a very high encoding gain has to be encoded as is 
the case with tonal pieces, for example. Certain quantizers 
function so that they quantize finer than necessary and, 
thus, introduce much less noise energy into the audio sig- 
nal through quantizing than they would be allowed to. It 

35 is, therefore, reasonable that the inventive apparatus as 
described in Fig. 3c assumes that the psychoacoust ic mask- 
ing threshold is much lower than it actually would be al- 
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5 lowed to be, which finally leads to the fact that the spec- 
tral representation of the spread information signal after 
weighting has much less energy than it would be allowed to 
have, whereby not all of the available energy that the wa- 
termark is allowed to have, is used. This would, however, 

10 not be the case when a guantizer is used which always in- 
troduces the maximum allowable noise energy during quantiz- 
ing and does not write to eventually remaining bits or 
fills them with any values not taken into consideration 
during decoding. In this case, the option illustrated in 

15 fig. 3c would be exactly the same as the first two possi- 
bilities. In the case of the variable guantizer, however, a 
variable bit rate is created as well. In this case, the wa- 
termark means could also be used to make the bit rate con- 
stant by filling up bits representing the watermark, so 

20 that the constant bit rate is the same as the highest bit 
rate of the original data stream with variable bit rates. 

In the following, it will be addressed how the noise energy 
which has been introduced by quantizing into a scale factor 
25 band will be computed by using the spectral values and the 
scale factors and above that the characteristic of quantiz- 
ing. Here, the following equation for the energy Fxi of the 
quantizing fault for a spectral value x x applies. 

30 |Fxi| 2 = (q 2a /12a 2 ) -Xi 2 ' 1 " 0 " 

It has to be noted that this equation applies to irregular 
quantizers as they are provided, for example, with the 
standard MPEG-AAC. For regular quantizers, the second term 
35 would simply be dropped, when 1 is inserted for a. 
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5 The factor q appearing in the equation is linked to the 
quantizer step size QS as follows: 

q = 2 QS/4 

10 The factor a is 3/4 for the MPEG-AAC quantizer. 

The energy of the quantization error in a scale factor band 
is then the sum of | Fxi | 2 in a scale factor band. This en- 
ergy has to be smaller than or equal to the psychoacous t ic 

15 masking threshold in this scale factor band in order to be 
inaudible. It has to be noted that the psychoacoust ic mask- 
ing threshold in a scale factor band is constant, but takes 
different values for different scale factor bands. For the 
energy of the quantization error x min , the following value 

20 results: 

xmin = X[(2 3/8 ' QS )/(27/4) ■ x^ / 2 ] 

i 

The index i is to show that summing always has to be done 
25 using the spectral values in the scale factor band, since 
the psychoacoustic masking threshold is usually given as 
energy for this scale factor band. 

It has to be noted that in the side information of the data 
30 stream, the quantizer step sizes for the individual scale 
factors are not given directly, but, however, according to 
agreement as specified in the AAC standard, the quantizer 
step size, which is associated to every scale factors, can 
be uniquely derived. Apart from that, the characteristic of 
35 the quantizer used in the original encoder for generating 
the data stream has to be known, i.e., if it is an irregu- 
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5 lar quantizer , its compression factor, which is the factor 
3/4 in the AAC standard. 

As already discussed, the spectral lines of the spectral 
representation of the spread information signal will now be 
10 weighted so that, together, they have an energy that is 
smaller than or equal to the psychoacoust ic maskable noise 
energy and, in the case of the option described in Fig. 3c, 
equal to the noise energy of the quantizing process. 

15 Considering the case that the noise energy introduced by 
quantizing in the scale factor band is already equal to the 
psychoacoustic masking threshold and then the same energy 
is introduced into the audio signal again, but only for the 
information to be introduced, then it can be seen that all 

20 the energy, i.e., the noise energy due to quantizing and 
the energy for the information can exceed the psychoacous- 
tic masking threshold, which can lead to audible quality 
losses, which will, however, be small due to the limitation 
of the energy of information to the psychoacoustic masking 

25 threshold, since the psychoacoustic masking threshold will 
be violated by a factor larger than 1. As already ex- 
plained, a watermark energy in the order of the psycho- 
acoustic masking threshold will lead to interferences when 
the quantizing noise is already in the order of the psycho- 

30 acoustic masking threshold. It is, therefore, preferred to 
chose the psychoacoustic maskable noise energy which will 
be weighted such that all the noise energy (quantizing 
noise plus "noise energy" of information) is smaller than 
1,5 times the psychoacoustic masking threshold, wherein 

35 even smaller factors up to close to 1,0 are possible. It 
has to be noted that small factors are also practical, 
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5 since very high information redundancy has already been in- 
troduced due to the spreading of the information signal. 

In other words, introducing a watermark into an audio sig- 
nal whose psychoacoust ic masking threshold has already been 
10 fully used up by noise energy due to quantizing leads to a 
lesser deterioration of the audio quality, which will, how- 
ever, be slightly cancelled by the advantages of the water- 
mark. 

15 In order to overcome this limitation, the concept shown in 
Fig. 3d can be used, wherein the quantizer in the encoder 
is controlled from the beginning, such that the noise en- 
ergy introduced by quantizing is chosen by setting the 
quantizer step size, such that it always stays below the 

20 psychoacoustic masking threshold by a predetermined amount. 
In other words, an audio encoder for such a concept works 
such that it quantizes finer than necessary, whereby an 
"energy potential" for the information to be introduced, 
i.e., for the watermark, is kept free. This has the advan- 

25 tage that a watermark can be fully introduced without qual- 
ity loss when, in establishing the psychoacoustic maskable 
noise energy (40d) , which is now smaller than the psycho- 
acoustic masking threshold by a predetermined amount, the 
predetermined value is considered in means 40d, so that the 

30 noise energy due to quantizing and the energy due to the 
information to be introduced are together equal to or 
smaller than the psychoacoustic masking threshold. Since 
the weighted spectral values of the spread information sig- 
nals are summed with the spectral values of the audio sig- 

35 nal, the spectral values of the information signal are, af- 
ter their weighting, equal to or smaller than the predeter- 
mined amount . 
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5 

This option has the advantage that a watermark can be in- 
troduced into a data stream without any quality loss, but 
that, however, on the one hand, the interoperability suf- 
fers and, since the quantizer in the encoder always has to 
10 stay below the psychoacoust ic masking threshold by the pre- 
determined amount when setting the noise energy by quantiz- 
ing. On the other hand, this implementation possibility is 
very efficient, since no psychoacoust ic model has to be 
computed . 

15 

In the following, reference is made to Fig. 4 wherein Fig. 
4 shows two possibilities for an encoder for audio signals 
to generate a data stream, which is especially suitable for 
introducing information according to the invention. Such an 

20 audio encoder can, basically, be constructed like a known 
audio encoder such that it comprises means 50 for generat- 
ing a spectral representation of the audio signal, a quan- 
tizer 52 for quantizing the spectral representation of the 
audio signal, an entropy encoder 54 for entropy encoding 

25 the quantized spectral values and, finally, a data stream 
multiplexer 56. The data stream output by the data stream 
multiplexer 56 receives, by an also-known psychoacoust ic 
model 58, the psychoacoust ic masking threshold via the data 
stream multiplexer 56, which is, in contrary to a known au- 

30 dio encoder, written into the data stream, such that the 
inventive apparatus for introducing information can simply 
access the psychoacoust ic masking threshold in the data 
stream. The encoder shown in Fig. 4 by a solid line 60 is 
therefore the counterpart to the apparatus shown in Fig. 1 

35 for introducing information including the option shown in 
Fig. 3b, as means for establishing maskable noise energy. 
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5 The audio encoder means according to the present invention 
is shown in Fig. 4 in dashed lines corresponding to the op- 
tion for means 40 shown in Fig. 3d for establishing the 
maskable noise energy in the inventive apparatus shown in 
Fig. 1. Here, the quantizer is controlled by a predeter- 

10 mined amount, such that the noise energy introduced by 
quantizing is below the psychoacoust ic masking threshold by 
the predetermined amount, wherein the value of the prede- 
termined amount is fed into the data stream multiplexer 56 
via the dotted line 62 in order to be comprised within the 

15 data stream such that the inventive apparatus for introduc- 
ing information can access the predetermined amount in or- 
der to weight respectively (block 36 in Fig. 2). 
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Claims 

Method for introducing information into a data stream 
including data about spectral values representing a 
short-term spectrum of an audio signal, including: 

processing the data stream to obtain the spectral val- 
ues of the short-term spectrum of the audio signal; 

combining the information with a spread sequence to 
obtain a spread information signal; 

generating a spectral representation of the spread in- 
formation signal to obtain a spectral spread informa- 
tion signal; 

establishing psychoacoustic maskable noise energy as 
function of frequency for the short-term spectrum of 
the audio signal, wherein the psychoacoustic maskable 
noise energy is smaller or the same as the psycho- 
acoustic masking threshold of the short-term spectrum; 

weighting the spectral spread information signal by 
using the established noise energy to generate a 
weighted information signal, wherein the energy of the 
introduced information is substantially equal to or 
below the psychoacoustic masking threshold; 

summing the weighted information signal with the spec- 
tral values of the short-term spectrum of the audio 
signal to obtain sum spectral values including the 
short-term spectrum of the audio signal and the infor- 
mation; and 
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processing the sum spectral values to obtain a proc- 
essed data stream including the data about the spec- 
tral values of the short-term spectrum of the audio 
signal and the information to be introduced. 

10 

2. Method according to claim 1, wherein the data stream 
comprises guantized spectral values as data about 
spectral values, the step of processing of the data 
stream including the following sub-step: 

15 

inverse quantizing the quantized spectral values to 
obtain the spectral values; and 

the step of processing the summed spectral values in- 
20 eluding: 

quantizing the sum spectral values to obtain quantized 
sub-spectral values; and 

25 forming the processed data stream using the quantized 

sum spectral values. 

3. Method according to claim 2 wherein the quantized 
spectral values in the data stream are entropy en- 

30 coded, the step of processing the data stream includ- 

ing the following sub-step: 

entropy-decoding the entropy-encoded spectral values 
to obtain the quantized spectral values; and 

35 

the step of processing the sum spectral values includ- 
ing : 



entropy-encoding the quantized sum spectral values. 

Method according to claim 1, wherein the step of es- 
tablishing the psychoacoustic maskable noise energy 
comprises : 

computing the psychoacoustic masking threshold as 
function of frequency using a psychoacoustic model, 
which is based on the spectral values of the audio 
signal . 

Method according to claim 1, wherein a masking thresh- 
old used in generating the data stream as function of 
frequency for the short-term spectrum is present in 
the data stream as side information, the step of es- 
tablishing including: 

extracting the psychoacoustic masking threshold from 
the data stream, wherein the psychoacoustic maskable 
noise energy is the same as the psychoacoustic masking 
threshold. 

Method according to claim 1, wherein the data stream 
further comprises side information including scale 
factors by which the spectral values will be multi- 
plied in groups in an audio encoder prior to quantiz- 
ing, the step of processing the data stream further 
including the following sub-step: 

extracting the scale factors from the data stream; and 
the step of establishing including: 



computing the noise energy introduced into the audio 
encoder when quantizing as function of frequency by 
using the scale factors for the short-term spectrum 
and by using the spectral values as well as knowing a 
quantizer used in the audio encoder, the introduced 
noise energy being a measure for the psychoacoust ic 
maskable noise energy used in weighting. 

Method according to claim 6, wherein the data stream 
is formed according to ISO/IEC 13818-7 (MPEG-2 7AAC) 
and the step of estimating the noise energy comprises: 

establishing a quantizing step for the spectral values 
from a scale factor band using the scale factor asso- 
ciated with this scale factor band; 

evaluating the following formula to obtain the noise 
energy for the scale factor band introduced by quan- 
tizing, 

xmin = Xt(2 3/8 - QS )/(27/4) . x .l / ^ 
i 

wherein x± is the i-th spectral line in a scale factor 
band, QS is the quantizing step for this scale factor 
band and xmin is the noise energy introduced in the 
scale factor band by quantizing; 

the step of weighting including: 

setting the spectral values of the spectral represen- 
tation of the spread information signal in the scale 
factor band such that the total energy of the set 
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spectral values is the same as the noise energy in 
this scale factor band obtained in the step of evalu- 
ating . 

Method according to claim 1, wherein the spectral val- 
ues of the data stream are quantized such that the 
noise energy introduced by quantizing is smaller than 
the psychoacoust ic masking threshold by a predeter- 
mined amount and wherein, in the step of establishing 
an energy corresponding to the predetermined amount is 
established; and 

wherein in the step of weighting the spectral values 
of the spectral representation of the spread informa- 
tion signal are set such that they have an energy 
corresponding to the predetermined amount. 

Method according to claim 1, wherein the value of the 
predetermined amount is present as side information in 
the data stream, in the step of establishing the value 
for the predetermined amount will be extracted from 
the side information of the data stream. 

Method according to claim 1, wherein in the step of 
processing the sum spectral values, the same quantiz- 
ing step sizes as in the original data stream are 
used . 

Method for encoding an audio signal including: 

generating a short-term spectrum of the audio signal 
including a plurality of spectral values; 



computing the psychoacoust ic masking threshold of the 
audio signal using a psychoacoust ic model; 

quantizing the spectral values considering the psycho- 
acoustic masking threshold so that the noise energy 
introduced by quantizing is smaller than the psycho- 
acoustic masking threshold by a predetermined amount; 

forming a bit stream including values corresponding to 
the quantized spectral values of the short-term spec- 
trum. 

Method according to claim 12, wherein in the step of 
forming an indication for the value (62) of the prede- 
termined amount is included in the bit stream. 

Apparatus for introducing information into a data 
stream including data about spectral values represent- 
ing a short-term spectrum of an audio signal, includ- 
ing : 

a processor for processing the data stream to obtain 
the spectral values of the short-term spectrum of the 
audio signal; 

a combiner for combining the information with a spread 
sequence to obtain a spread information signal; 

a generator for generating a spectral representation 
of the spread information signal to obtain a spectral 
spread information signal; 



an establisher for establishing psychoacoust ic 

maskable noise energy as function of the frequency for 
the short-term spectrum of the audio signal, wherein 
the psychoacoust ic maskable noise energy is smaller 
than or equal to the psychoacoust ic masking threshold 
of the short-term spectrum; 

a weighter for weighting the spectral spread informa- 
tion signal by using the established noise energy to 
generate a weighted information signal, wherein the 
energy of the introduced information is substantially 
equal to or below the psychoacoust ic masking thresh- 
old; 

a summer for summing the weighted information signal 
with the spectral values of the short-term spectrum of 
the audio signal to obtain spectral values including 
the short-term spectrum of the audio signal and the 
information; and 

another processor for processing the sum spectral 
values to obtain a processed data stream including the 
data about the spectral values of the short-term spec- 
trum of the audio signal and the information to be in- 
troduced . 

Apparatus for encoding an audio signal, including: 

a generator for generating a short-term spectrum of 
the audio signal including a plurality of spectral 
values ; 
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5 a calculator for computing a psychoacoust ic masking 

threshold of the audio signal using a psychoacoust ic 
model ; 

a quantizer for quantizing spectral values considering 
10 the psychoacoust ic masking threshold so that the noise 

energy introduced by quantizing is smaller than the 
psychoacoustic masking threshold by a predetermined 
amount ; 



15 



a bitstream formatter for forming a bit stream includ- 
ing values corresponding to the quantized spectral 
values of the short-term spectrum. 
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5 Method, and Apparatus for Introducing Information into a 

Data Stream and a Method and Apparatus for Encoding an Au- 
dio Signal 



10 Abstract 

An inventive method for introducing information into a data 
stream including data about spectral values representing a 
short-term spectrum of an audio signal first performs a 

15 processing of the data stream to obtain the spectral values 
of the short-term spectrum of the audio signal. Apart from 
that, the information to be introduced are combined with a 
spread sequence to obtain a spread information signal, 
whereupon a spectral representation of the spread informa- 

20 tion is generated which will then be weighted with an es- 
tablished psychoacoust ic maskable noise energy to generate 
a weighted information signal, wherein the energy of the 
introduced information is substantially equal to or below 
the psychoacoustic masking threshold. The weighted informa- 

25 tion signal and the spectral values of the short-term spec- 
trum of the audio signal will then be summed and afterwards 
processed again to obtain a processed data stream including 
both audio information and information to be introduced. By 
the fact that the information to be introduced are intro- 

30 duced into the data stream without changing to the time do- 
main, the block rastering underlying the short-term spec- 
trum will not be touched, so that introducing a watermark 
will not lead to tandem encoding effects. 
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y\ LL Mkln^- May 22, 2002 — 


Wohnsitz 


Residence A 

91054 Buckenhof. Germany / Plf*Y 


Slaatsangehorigkeit 


Citizenship f] 

German 


Postanschrift 


Post Office Address 
Am Eichengarten 11 




91054 Buckenhof, Germany 



(Bitte entsprechende /nformationen und Unterschriften im 
Falle von dritten und weiteren MHeriindem angeben). 



(Supply similar information and signature for third and j 
sequent joint inventors.) 
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Form PTO-FB-240 (8-B3) 



Patent and Trademark Office-U.S. DEPARTMENT OF COMMERC 



German Language Declaration 



VERTRETUNGSVOLLMACHT: Als benannter Er/inder 
beauftrage ich hiermit den nachstehend benannten Pa(en- 
tanwaft (oder die nachstehend benannlen Palenlanwalle) und/ 
Oder Patent-Agenten mi! der Verfolgung der vodiegenden 
Patentanmeldung sowio mil der Abwicklung aller damil ver- 
bundenen Geschafte vordem Patent-und Warenzeichenamt: 
(Name und Regislrs-jonsnummer an(uhren) 



POWER OF ATTORNEY: As a named inventor. I hereby 
appoml Ihe following arromey(s) and/or agen((s) to prosecule 
this appfication and transact all business in the Patent and 
Trademark Office connected therewith, {fist name and reg- 
istration number) 

Michael A. GLENN, Reg. No. 30,176 
Donald M . HENDRICKS, Reg. No. 40,355 
Kirk D. WONG, Reg. No. 43,284 

Julia Thomas, Reg. No. P52.283 
Christopher PEIL, Reg. No. 45,005 



Telefongesprache bitte nchter 
(Vawe und Tclelonnummer) 



Oirect Telephone Calls to. (name and telephone number) 



Send Correspondence to: 
GLENN PATENT GROUP 

3475 Edison Way, Suite L, 

Menlo Park, CA 94025 

U.S.A. 



Voller Name des dritten Miter finders, falls zutreffend 




Unlerscnnft des Erlcnders Datum / 
Wonrvsitz 


^filS-SMBMBBb May ; 22,°2S02 | 




Residence » i 
91054 Erlangen, German^ / V 


Staatsangenongkeu 


Citizenship \LJl' r - i/j 

German 


Poslanschrirl 


Post Office Address 

Haagstrasse 32 




91054 Erlangen, Germany 


Voller Name des vierten Miterf inders , falls zutreffer^ 


feHLL M*^ inVent ° r ' * 


Unterscnntt des Eriioders Oaturn/ 


Fourth inventor's signature Oato 


Wohruatz 


Residence j~ \ 

90489 Nuernberg. Germany^ / jf^/ 


Slaalsangehongkeit 


Citizenship 1 

German 


Postanschrirt 


Post Office Address 
Sulzbacherstrasse 41 




90489 Nuernberg, Germany 



(Bitte entsprechende ln(orma(ionen and Unterschriften im 
Falte von dntten und weiteren Miterfindem angeben). 



(Supply similar information and signature tor third and sub- 
sequent joint inventors.) 
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Form PTO-Pn-2tO tfi-83) " — — 

1 ' Patent and Trademark Oh"ice-U.S. (DEPARTMENT OF COMMEflC 



German Language Declaration 



VERTRETUNGSVOLLMACHT: Als benannter Erfinder 
beauftrage ich hiermil den nachstehend benannton Paten- 
tanwalt (oder die nachstehend benannten Patentanwalte) und/ 
Oder Patenl-Agenien mit der Veriolgung der vodiegenden 
Palenlanmeldung sowie mil der Abwccklung aller damit ver- 
bundenen Geschafte vor dem Palent-und Warenzeichenaml: 
(Name und Registrs::onsnummer anfuhren) 



POWER OF ATTORNEY: As a named inventor. I hereby 
appoint the following atlomey(s) and/or agent(s) to prosecute 
this application and transact all business in the Patent and 
Trademark Office connected therewith, (list name and reg- 
istration number) 

Michael A. GLENN, Reg. No. 30,176 
Donald M. HENDRICKS , Reg. No. 4 0,3 55 
Kirk D. WONG, Reg. No. 43,284 
Julia Thomas, Reg. No. P52,283 
Christopher PEIL, Reg. No. 45,005 



Telefongesprache bttte nchten an: Oireci Telephone Calls to: [name and telephone number) 

Qjame und Telcfonnummer) 



PostanschriK: 


Send Correspondence to: 
GLENN PATENT GROUP 

3475 Edison Way, Suite L, 

Menlo Park, CA 94025 

U.S.A. 


Voller Name des dritten Miterfinders, fnHw zutreffend 


Full name of third joint inventor, if any 


Unlerscnrift des Ertinders Datum 


Third inventor's signature ° ato 
Karlheinz BRANDENBURG 


Wohnsitz 


Residence 

91054 Erlangen, Germany 


S taatsangehongkeit 


Citizenship 

German 


Postanschrirt 


Post Otlice Address 

Haagstrasse 32 




91054 Erlangen, Germany 


Voller Name des vierten Miterfinders, falls zutreffend 


Full name of fourth -joint inventor, if any 
Eric ALLAMANCHE ' 3 


Unterschritl des Ertinders Datum 


Fo^h^^&r^rgnature May 22 °^ 002 


Wohnsilz 


Residence 

90489 Nuernberg, Germany 


Staatsangehoogkeit 


Citizenship 

German 


Postanschrift 


Post Office Address 

Sulzbacherstrasse 41 




90489 Nuernberg , Germany 



(Bitte entspcechende Informationen und Untenschriften im (Supply similar information and signature tor third and sub- 

Fatle von dntten und weiteren Miteriindem angeben). sequent joint inventors.) 



Form PTO-FB-240 (8-83) 
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Patent and Trademark Office-U.S. DEPARTMENT OF COMMEBC 



